I think that this is a bit of an idiosyncratic model for learning to rank, but it is a reasonably viable one.
It would be good to have a discussion of what you find hard or easy and what you think is needed to make this work. Let's talk. On Sun, Feb 9, 2014 at 2:26 PM, peng <[email protected]> wrote: > This is what I believe to be a typical learning to rank model: > > 1. Create many weak rankers/scorers (a.k.a feature engineering, in Solr > these are queries/function queries). > 2. Test those scorers on a ground truth dataset. Generating feature > vectors for top-n results annotated by human. > 3. Use an existing classifier/regressor (e.g. support vector ranking, > GBDT, random forest etc.) on those feature vectors to get a ranking model. > 4. Export this ranking model back to Solr as a custom ensemble query (a > BooleanQuery with custom boosting factor for linear model, or a > CustomScoreQuery with custom scoring function for non-linear model), push > it to Solr server, register with QParser. Push it to production. End of. > > But I didn't find this workflow quite easy to implement in mahout-solr > integration (is it discouraged for some reason?). Namely, there is no > pipeline from results of scorers to a Mahout-compatible vector form, and > there is no pipeline from ranking model back to ensemble query. (I only > found the lucene2seq class, and the upcoming recommendation support, which > don't quite fit into the scenario). So what's the best practice for easily > implementing a realtime, learning to rank search engine in this case? I've > worked in a bunch of startups and such appliance seems to be in high > demand. (Remember that solr-based collaborative filtering model proposed by > Dr Dunning? This is the content-based counterpart of it) > > I'm looking forward to streamline this process to make my upcoming work > easier. I think Mahout/Solr is the undisputed instrument of choice due to > their scalability and machine learning background of many of their top > committers. Can we talk about it at some point? > > Yours Peng >
