On Wed, Apr 5, 2017 at 12:55 PM, Aaron Halfaker <[email protected]>
wrote:

> Link to code?
>
> No code yet, although there is proof of concept code which this will
inform this work at
stat1002.eqiad.wmnet:/a/ebernhardson/spark_feature_log/code


> "ltr" means "left to right" to me.  Maybe you could do something like
> "ltrank"
>
> Sounds like LTR is out as the term is already used elsewhere and is more
widely known. LTRank isn't a bad compromise with spelling out the whole
thing.


> On Wed, Apr 5, 2017 at 2:28 PM, Erik Bernhardson <
> [email protected]> wrote:
>
>> We seem to have some consensus that for the upcoming learning to rank
>> work we will build out a python library to handle the bulk of the backend
>> data plumbing work. The library will primarily be code integrating with
>> pyspark to do various pieces such as:
>>
>> # Sampling from the click logs to generate the set of queries + page's
>> that will be labeled with click models
>> # Distributing the work of running click models against those sampled
>> data sets
>> # Pushing queries we use for feature generation into kafka, and reading
>> back the resulting feature vectors (the other end of this will run those
>> generated queries against either the hot-spare elasticsearch cluster or the
>> relforge cluster to get feature scores)
>> # Merging feature vectors with labeled data, splitting into
>> test/train/validate sets, and writing out files formatted for whichever
>> training library we decide on (xgboost, lightgbm and ranklib are in the
>> running currently)
>> # Whatever plumbing is necessary to run the actual model training and do
>> hyper parameter optimization
>> # Converting the resulting models into a format suitable for use with the
>> elasticsearch learn to rank plugin
>> # Reporting on the quality of models vs some baseline
>>
>> The high level goal is that we would have relatively simple python
>> scripts in our analytics repository that are called from oozie, those
>> scripts would know the appropriate locations to load/store data and pass
>> into this library for the bulk of the processing. There will also be some
>> script, probably within the library, that combines many of these steps for
>> feature engineering purposes to take some set of features and run the whole
>> thing.
>>
>> So, what do we call this thing? Horrible first attempts:
>>
>> * ltr-pipeline
>> * learn-to-rank-pipeline
>> * bob
>> * cirrussearch-ltr
>> * ???
>>
>>
>> _______________________________________________
>> AI mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/ai
>>
>>
>
> _______________________________________________
> AI mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/ai
>
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to