SnakePipe - get it? Python and 'plumbing'?

Yours,
Chris Koerner
Community Liaison - Discovery
Wikimedia Foundation

On Wed, Apr 5, 2017 at 2:28 PM, Erik Bernhardson <[email protected]
> wrote:

> We seem to have some consensus that for the upcoming learning to rank work
> we will build out a python library to handle the bulk of the backend data
> plumbing work. The library will primarily be code integrating with pyspark
> to do various pieces such as:
>
> # Sampling from the click logs to generate the set of queries + page's
> that will be labeled with click models
> # Distributing the work of running click models against those sampled data
> sets
> # Pushing queries we use for feature generation into kafka, and reading
> back the resulting feature vectors (the other end of this will run those
> generated queries against either the hot-spare elasticsearch cluster or the
> relforge cluster to get feature scores)
> # Merging feature vectors with labeled data, splitting into
> test/train/validate sets, and writing out files formatted for whichever
> training library we decide on (xgboost, lightgbm and ranklib are in the
> running currently)
> # Whatever plumbing is necessary to run the actual model training and do
> hyper parameter optimization
> # Converting the resulting models into a format suitable for use with the
> elasticsearch learn to rank plugin
> # Reporting on the quality of models vs some baseline
>
> The high level goal is that we would have relatively simple python scripts
> in our analytics repository that are called from oozie, those scripts would
> know the appropriate locations to load/store data and pass into this
> library for the bulk of the processing. There will also be some script,
> probably within the library, that combines many of these steps for feature
> engineering purposes to take some set of features and run the whole thing.
>
> So, what do we call this thing? Horrible first attempts:
>
> * ltr-pipeline
> * learn-to-rank-pipeline
> * bob
> * cirrussearch-ltr
> * ???
>
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to