I think that this is a bit of an idiosyncratic model for learning to rank,
but it is a reasonably viable one.

It would be good to have a discussion of what you find hard or easy and
what you think is needed to make this work.

Let's talk.



On Sun, Feb 9, 2014 at 2:26 PM, peng <[email protected]> wrote:

> This is what I believe to be a typical learning to rank model:
>
> 1. Create many weak rankers/scorers (a.k.a feature engineering, in Solr
> these are queries/function queries).
> 2. Test those scorers on a ground truth dataset. Generating feature
> vectors for top-n results annotated by human.
> 3. Use an existing classifier/regressor (e.g. support vector ranking,
> GBDT, random forest etc.) on those feature vectors to get a ranking model.
> 4. Export this ranking model back to Solr as a custom ensemble query (a
> BooleanQuery with custom boosting factor for linear model, or a
> CustomScoreQuery with custom scoring function for non-linear model), push
> it to Solr server, register with QParser. Push it to production. End of.
>
> But I didn't find this workflow quite easy to implement in mahout-solr
> integration (is it discouraged for some reason?). Namely, there is no
> pipeline from results of scorers to a Mahout-compatible vector form, and
> there is no pipeline from ranking model back to ensemble query. (I only
> found the lucene2seq class, and the upcoming recommendation support, which
> don't quite fit into the scenario). So what's the best practice for easily
> implementing a realtime, learning to rank search engine in this case? I've
> worked in a bunch of startups and such appliance seems to be in high
> demand. (Remember that solr-based collaborative filtering model proposed by
> Dr Dunning? This is the content-based counterpart of it)
>
> I'm looking forward to streamline this process to make my upcoming work
> easier. I think Mahout/Solr is the undisputed instrument of choice due to
> their scalability and machine learning background of many of their top
> committers. Can we talk about it at some point?
>
> Yours Peng
>

Reply via email to