[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193017#comment-15193017
 ] 

Alessandro Benedetti commented on SOLR-8542:
--------------------------------------------

As I briefly discussed with Diego, about how to include the training in Solr as 
well :
A simple integration could be :

1) select a supported training library for linear SVM and one for the 
LambdaMart ( basically the libraries that you already suggest in the README 
could be a starting point)

2) create an Update Request handler that accepts the training set ( and the 
format of the training set will be clearly described in the documentation like 
: LETOR )
This update handler will basically take the training set file and related 
parameters supported by the related library to proceed with the training. 
Trying to use the default configuration parameter where possible, in the way to 
make it as easy as possible the user interaction.
The update handler  will then extract the document features ( a revisit of the 
cache could be interesting in here, to improve the rycicling of feature 
extraction)

3) update request handler will train the model calling internally the selected 
library , using all the parameters provided. The model generated will be 
converted in the supported Json format and stored in the model store.

This sample approach could be complicated as much as we want ( we can add 
flexibility in the library to be used and make it easy to extend) .
A further next step could be to add a layer of signal processing directly in 
Solr , to build the training set as well .
( a sort of REST Api that takes in input the  document, queryId, rating score) 
and automatically create an entry of the training set stored in some smart way.
Than we can trigger the model generation or set up  schedule to refresh the 
model automatically.
We could even take into account only certain periods, store training data in 
different places, clean the training set automatically from time to time ect 
ext :)
Now I am going off topic, but there are a lot of things to do with the training 
, to ease the integration :)
Happy to discuss them and get new ideas to improve the plugin which I think is 
going to be really , really valuable for the Solr community

> Integrate Learning to Rank into Solr
> ------------------------------------
>
>                 Key: SOLR-8542
>                 URL: https://issues.apache.org/jira/browse/SOLR-8542
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joshua Pantony
>            Assignee: Christine Poerschke
>            Priority: Minor
>         Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>    
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>     
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to