[
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256886#comment-15256886
]
Joshua Pantony commented on SOLR-8542:
--------------------------------------
Hi, thanks for the interest! Was there a specific algorithm you had in mind
that is currently not supported? Often it is possible to formulate comparisons
in the training phase in such a way that you can still compare just one score
in the live phase. Lets use rankSVM (a pairwise approach) as an example. Given
documents D1 and D2, the feature vector represented by the function V(D), if we
know that D1 > D2, we can formulate this in the training stage as the objective
function (V(D1) - V(D2)) * W > 0 . Here we have created an objective function
by directly comparing pairs of documents D1 and D2, hence it is pairwise. In
the live phase given documents D1, D2, D3 and D4 we "could" do a direct
pairwise approach aka:
(V(D1) - V(D2)) * W > 0 ?,
(V(D1) - V(D3)) * W > 0 ?,
(V(D1) - V(D4)) * W > 0 ?,
(V(D2) - V(D3)) * W > 0 ?,
(V(D2) - V(D4)) * W > 0 ?,
(V(D3) - V(D4)) * W > 0 ?
However this is computationally inefficient. In this case if we do a direct
comparison using our original objective function that we trained on, we'd need
to do 6 dot products. Using some basic math, in the live phase we can change
(D1 - D2) * W > 0 to V(D1) * W > V(D2) * W . Now all I need to do in a live
setting is calculate V(D1) * W, V(D2) * W, V(D3) * W, V(D4) * W . Once we do
that we can just sort the numbers and volla we've done pairwise comparisons in
the same time complexity as a pointwise approach. Of course don't trust me,
read this paper:
http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf (note I
vastly simplified rank SVM here for ease of dialogue).
So all that being said, I'll circle back to my original question, was there a
specific algorithm you had in mind that we don't easily support? If so happy to
add it in some future patch (no promise on when though ). [should be noted
there is some debate / grey area around if lambdaMART is listwise or pairwise
but it is generally considered among the strongest performing methods]
> Integrate Learning to Rank into Solr
> ------------------------------------
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
> Issue Type: New Feature
> Reporter: Joshua Pantony
> Assignee: Christine Poerschke
> Priority: Minor
> Attachments: README.md, README.md, SOLR-8542-branch_5x.patch,
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features
> directly inside Solr for use in training a machine learned model. You can
> then deploy that model to Solr and use it to rerank your top X search
> results. This concept was previously presented by the authors at Lucene/Solr
> Revolution 2015 (
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
> ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson,
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached
> documentation as a github MD file, but are happy to convert to a desired
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin
> with
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts
> h4. 3. stop it and install the plugin:
>
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'
> --data-binary "@./contrib/ltr/example/techproducts-features.json" -H
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'
> --data-binary "@./contrib/ltr/example/techproducts-model.json" -H
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]