[ https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186931#comment-15186931 ]
ASF GitHub Bot commented on SOLR-8542: -------------------------------------- Github user alessandrobenedetti commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/4#discussion_r55499494 --- Diff: solr/contrib/ltr/README.txt --- @@ -0,0 +1,330 @@ +Apache Solr Learning to Rank +======== + +This is the main [learning to rank integrated into solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp) +repository. +[Read up on learning to rank](https://en.wikipedia.org/wiki/Learning_to_rank) + +Apache Solr Learning to Rank (LTR) provides a way for you to extract features +directly inside Solr for use in training a machine learned model. You can then +deploy that model to Solr and use it to rerank your top X search results. + + +# Changes to solrconfig.xml +```xml +<config> + ... + + <!-- Query parser used to rerank top docs with a provided model --> + <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" /> + + <!-- Transformer that will encode the document features in the response. + For each document the transformer will add the features as an extra field + in the response. The name of the field we will be the the name of the + transformer enclosed between brackets (in this case [features]). + In order to get the feature vector you will have to + specify that you want the field (e.g., fl="*,[features]) --> + <transformer name="features" class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory" /> + + + <!-- Component that hooks up managed resources for features and models --> + <searchComponent name="ltrComponent" class="org.apache.solr.ltr.ranking.LTRComponent"/> + <requestHandler name="/query" class="solr.SearchHandler"> + <lst name="defaults"> + <str name="echoParams">explicit</str> + <str name="wt">json</str> + <str name="indent">true</str> + <str name="df">id</str> + </lst> + <arr name="last-components"> + <!-- Use the component in your requestHandler --> + <str>ltrComponent</str> + </arr> + </requestHandler> + + <query> + ... + + <!-- Cache for storing and fetching feature vectors --> + <cache name="QUERY_DOC_FV" + class="solr.search.LRUCache" + size="4096" + initialSize="2048" + autowarmCount="4096" + regenerator="solr.search.NoOpRegenerator" /> + </query> + +</config> + +``` + + +# Build the plugin +In the solr/contrib/ltr directory run +`ant dist` + +# Install the plugin +In your solr installation, navigate to your collection's lib directory. +In the solr install example, it would be solr/collection1/lib. +If lib doesn't exist you will have to make it, and then copy the plugin's jar there. + +`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar mySolrInstallPath/solr/myCollection/lib` + +Restart your collection using the admin page and you are good to go. +You can find more detailed instructions [here](https://wiki.apache.org/solr/SolrPlugins). + + +# Defining Features +In the learning to rank plugin, you can define features in a feature space +using standard Solr queries. As an example: + +###### features.json +```json +[ +{ "name": "isBook", + "type": "org.apache.solr.ltr.feature.impl.SolrFeature", + "params":{ "fq": ["{!terms f=category}book"] } +}, +{ + "name": "documentRecency", + "type": "org.apache.solr.ltr.feature.impl.SolrFeature", + "params": { + "q": "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)" + } +}, +{ + "name":"originalScore", + "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature", + "params":{} +}, +{ + "name" : "userTextTitleMatch", + "type" : "org.apache.solr.ltr.feature.impl.SolrFeature", + "params" : { "q" : "{!field f=title}${user_text}" } +} +] +``` + +Defines four features. Anything that is a valid Solr query can be used to define +a feature. + +### Filter Query Features +The first feature isBook fires if the term 'book' matches the category field +for the given examined document. Since in this feature q was not specified, +either the score 1 (in case of a match) or the score 0 (in case of no match) +will be returned. + +### Query Features +In the second feature (documentRecency) q was specified using a function query. +In this case the score for the feature on a given document is whatever the query +returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated +2 years ago, etc..) . If both an fq and q is used, documents that don't match +the fq will receive a score of 0 for the documentRecency feature, all other +documents will receive the score specified by the query for this feature. + +### Original Score Feature +The third feature (originalScore) has no parameters, and uses the +OriginalScoreFeature class instead of the SolrFeature class. Its purpose is +to simply return the score for the original search request against the current +matching document. + +### External Features +Users can specify external information that can to be passed in as +part of the query to the ltr ranking framework. In this case, the +fourth feature (userTextPhraseMatch) will be looking for an external field +called 'user_text' passed in through the request, and will fire if there is +a term match for the document field 'title' from the value of the external +field 'user_text'. See the "Run a Rerank Query" section for how +to pass in external information. + +### Custom Features +Custom features can be created by extending from +org.apache.solr.ltr.ranking.Feature, however this is generally not recommended. +The majority of features should be possible to create using the methods described +above. + +# Defining Models +Currently the Learning to Rank plugin supports 2 main types of +ranking models: [Ranking SVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf) +and [LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf) + +### Ranking SVM +Currently only a linear ranking svm is supported. Use LambdaMART for +a non-linear model. If you'd like to introduce a bias set a constant feature +to the bias value you'd like and make a weight of 1.0 for that feature. + +###### model.json +```json +{ + "type":"org.apache.solr.ltr.ranking.RankSVMModel", + "name":"myModelName", + "features":[ + { "name": "userTextTitleMatch"}, + { "name": "originalScore"}, + { "name": "isBook"} + ], + "params":{ + "weights": { + "userTextTitleMatch": 1.0, + "originalScore": 0.5, + "isBook": 0.1 + } + + } +} +``` + +This is an example of a toy Ranking SVM model. Type specifies the class to be +using to interpret the model (RankSVMModel in the case of Ranking SVM). +Name is the model identifier you will use when making request to the ltr +framework. Features specifies the feature space that you want extracted +when using this model. All features that appear in the model params will +be used for scoring and must appear in the features list. You can add +extra features to the features list that will be computed but not used in the +model for scoring, which can be useful for logging. +Params are the Ranking SVM parameters. + +Good library for training SVM's (https://www.csie.ntu.edu.tw/~cjlin/liblinear/ , +https://www.csie.ntu.edu.tw/~cjlin/libsvm/) . You will need to convert the +libSVM model format to the format specified above. + +### LambdaMART + +###### model2.json +```json +{ + "type":"org.apache.solr.ltr.ranking.LambdaMARTModel", + "name":"lambdamartmodel", + "features":[ + { "name": "userTextTitleMatch"}, + { "name": "originalScore"} + ], + "params":{ + "trees": [ + { + "weight" : 1, + "tree": { + "feature": "userTextTitleMatch", + "threshold": 0.5, + "left" : { + "value" : -100 + }, + "right": { + "feature" : "originalScore", + "threshold": 10.0, + "left" : { + "value" : 50 + }, + "right" : { + "value" : 75 + } + } + } + }, + { + "weight" : 2, + "tree": { + "value" : -10 + } + } + ] + } +} +``` +This is an example of a toy LambdaMART. Type specifies the class to be using to +interpret the model (LambdaMARTModel in the case of LambdaMART). Name is the +model identifier you will use when making request to the ltr framework. +Features specifies the feature space that you want extracted when using this +model. All features that appear in the model params will be used for scoring and +must appear in the features list. You can add extra features to the features +list that will be computed but not used in the model for scoring, which can +be useful for logging. Params are the LambdaMART specific parameters. In this +case we have 2 trees, one with 3 leaf nodes and one with 1 leaf node. + +A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ). +You will need to convert the RankLib model format to the format specified above. + +# Deploy Models and Features +To send features run + +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/fstore' --data-binary @/path/features.json -H 'Content-type:application/json'` + +To send models run + +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/mstore' --data-binary @/path/model.json -H 'Content-type:application/json'` + + +# View Models and Features +`curl -XGET 'http://localhost:8983/solr/collection1/schema/fstore'` +`curl -XGET 'http://localhost:8983/solr/collection1/schema/mstore'` + + +# Run a Rerank Query +Add to your original solr query +`rq={!ltr model=myModelName reRankDocs=25}` + +The model name is the name of the model you sent to solr earlier. +The number of documents you want reranked, which can be larger than the +number you display, is reRankDocs. + +### Pass in external information for external features +Add to your original solr query +`rq={!ltr reRankDocs=3 model=externalmodel efi.field1='text1' efi.field2='text2'}` + +Where "field1" specifies the name of the customized field to be used by one +or more of your features, and text1 is the information to be pass in. As an +example that matches the earlier shown userTextTitleMatch feature one could do: + +`rq={!ltr reRankDocs=3 model=externalmodel efi.user_text='Casablanca' efi.user_intent='movie'}` + +# Extract features +To extract features you need to use the feature vector transformer + set the +fv parameter to true (this required parameter will be removed in the future). +For now you need to also use a dummy model with all the features you want to +extract inside the features parameter list of the model (this limitation will +also be changed in the future so you can extract features without a dummy model). + +`fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25}` + +## Test the plugin with solr/example/techproducts in 6 steps + +Solr provides some simple example of indices. In order to test the plugin with +the techproducts example please follow these steps + +1. compile solr and the examples + + cd solr + ant dist + ant example --- End diff -- I think ant example is deprecated in the current master branch, we should point that with recent releases, ant server is necessary! > Integrate Learning to Rank into Solr > ------------------------------------ > > Key: SOLR-8542 > URL: https://issues.apache.org/jira/browse/SOLR-8542 > Project: Solr > Issue Type: New Feature > Reporter: Joshua Pantony > Assignee: Christine Poerschke > Priority: Minor > Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, > SOLR-8542-trunk.patch > > > This is a ticket to integrate learning to rank machine learning models into > Solr. Solr Learning to Rank (LTR) provides a way for you to extract features > directly inside Solr for use in training a machine learned model. You can > then deploy that model to Solr and use it to rerank your top X search > results. This concept was previously presented by the authors at Lucene/Solr > Revolution 2015 ( > http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp > ). > The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, > David Grohmann and Diego Ceccarelli. > Any chance this could make it into a 5x release? We've also attached > documentation as a github MD file, but are happy to convert to a desired > format. > h3. Test the plugin with solr/example/techproducts in 6 steps > Solr provides some simple example of indices. In order to test the plugin > with > the techproducts example please follow these steps > h4. 1. compile solr and the examples > cd solr > ant dist > ant example > h4. 2. run the example > ./bin/solr -e techproducts > h4. 3. stop it and install the plugin: > > ./bin/solr stop > mkdir example/techproducts/solr/techproducts/lib > cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar > example/techproducts/solr/techproducts/lib/ > cp contrib/ltr/example/solrconfig.xml > example/techproducts/solr/techproducts/conf/ > h4. 4. run the example again > > ./bin/solr -e techproducts > h4. 5. index some features and a model > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore' > --data-binary "@./contrib/ltr/example/techproducts-features.json" -H > 'Content-type:application/json' > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore' > --data-binary "@./contrib/ltr/example/techproducts-model.json" -H > 'Content-type:application/json' > h4. 6. have fun ! > *access to the default feature store* > http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ > *access to the model store* > http://localhost:8983/solr/techproducts/schema/mstore > *perform a query using the model, and retrieve the features* > http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org