Learning to rank support in Mahout and Solr integration?

peng Sun, 09 Feb 2014 14:30:09 -0800

This is what I believe to be a typical learning to rank model:

1. Create many weak rankers/scorers (a.k.a feature engineering, in Solrthese are queries/function queries).2. Test those scorers on a ground truth dataset. Generating featurevectors for top-n results annotated by human.3. Use an existing classifier/regressor (e.g. support vector ranking,GBDT, random forest etc.) on those feature vectors to get a ranking model.4. Export this ranking model back to Solr as a custom ensemble query (aBooleanQuery with custom boosting factor for linear model, or aCustomScoreQuery with custom scoring function for non-linear model),push it to Solr server, register with QParser. Push it to production.End of.

But I didn't find this workflow quite easy to implement in mahout-solrintegration (is it discouraged for some reason?). Namely, there is nopipeline from results of scorers to a Mahout-compatible vector form, andthere is no pipeline from ranking model back to ensemble query. (I onlyfound the lucene2seq class, and the upcoming recommendation support,which don't quite fit into the scenario). So what's the best practicefor easily implementing a realtime, learning to rank search engine inthis case? I've worked in a bunch of startups and such appliance seemsto be in high demand. (Remember that solr-based collaborative filteringmodel proposed by Dr Dunning? This is the content-based counterpart of it)

I'm looking forward to streamline this process to make my upcoming workeasier. I think Mahout/Solr is the undisputed instrument of choice dueto their scalability and machine learning background of many of theirtop committers. Can we talk about it at some point?


Yours Peng

Learning to rank support in Mahout and Solr integration?

Reply via email to