[ https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183195#comment-15183195 ]
Christine Poerschke commented on SOLR-8542: ------------------------------------------- bq. ... Question: The only reason we currently have the LTRComponent is so that it can register the Model and Feature stores as managed resources because it can be SolrCore aware. Is there a way we can do this without the use of a component? Not answering directly the managed resources part of the question but having noticed that the features.json/model.json needs to be accompanied by various solrconfig.xml changes in practice - I wonder if configuring models as plugin part of solrconfig.xml might be something to explore? ---- *current (features|model).json and solrconfig.xml configuration:* {code} ###### features.json ... ###### firstModel.json ... ###### secondModel.json ... ###### solrconfig.xml ... <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" /> ... <transformer name="features" class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory"/> ... <searchComponent name="ltrComponent" class="org.apache.solr.ltr.ranking.LTRComponent"/> ... <requestHandler name="/query" class="solr.SearchHandler"> ... <arr name="last-components"> <str>ltrComponent</str> </arr> </requestHandler> ... {code} ---- *potential alternative solrconfig.xml configuration:* {code} ###### solrconfig.xml ... <!-- no queryParser name="ltr" element since LTRQParserPlugin is in QParserPlugin.standardPlugins --> <!-- no transformer name="features" since LTRFeatureLoggerTransformerFactory is in TransformerFactory.defaultFactories --> <reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory"> <!-- model features --> <str name="features">originalScore,isBook</str> <str name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name="isBook.fq">{!terms f=category}book</str> <!-- model parameters --> <float name="weights.originalScore">0.5</float> <float name="weights.isBook">0.1</float> </reRankModelFactory> <reRankModelFactory class="solr.SVMRerankModelFactory"> <str name="">mySecondModelName</str> ... </reRankModelFactory> ... {code} ---- _The most obvious implication_ of having a new solrconfig.xml element instead of (features|model).json managed resources would be that {{solr/core}} rather than {{solr/contrib/ltr}} contains the code. * From an end-user perspective this means 'Learning to Rank' support out-of-the-box i.e. no need to build and deploy extra jar files plus no need to configure LTRQParserPlugin and LTRFeatureLoggerTransformerFactory queryParser and transformer elements. Though note that {{<reRankModelFactory class="mycompany.MyCustomReRankModelFactory">}} customisation is supported if something other than the out-of-the-box models is required. * One of the out-of-the-box factories could be a features-only factory similar to the 'dummyModel' mentioned above, e.g. {code} <reRankModelFactory name="featuresOnly" class="solr.NoRerankingFactory"> <str name="features">originalScore,isBook</str> <str name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name="isBook.fq">{!terms f=category}book</str> </reRankModelFactory> {code} _A concern might be_ that the reRankModelFactory element(s) would bloat solrconfig.xml and that the element(s) being embedded in solrconfig.xml would be more difficult to edit than one or two json files. * The bloat concern can be addressed via {{xi:include}} e.g. {code} ###### solrconfig.xml ... <xi:include href="solrconfig-reRankModelFactory-myFirstModelName.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/> ... ###### solrconfig-reRankModelFactory-myFirstModelName.xml <reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory"> <!-- model features --> <str name="features">originalScore,isBook</str> <str name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name="isBook.fq">{!terms f=category}book</str> <!-- model parameters --> <float name="weights.originalScore">0.5</float> <float name="weights.isBook">0.1</float> </reRankModelFactory> {code} * xml vs. json representation is a fair point, if the feature engineering process usually outputs json files then perhaps a simple utility script could help convert that json into solrconfig.xml a reRankModelFactory xml element. _A factory approach_ could naturally support arbitrary models including chaining or nesting of models. (A factory approach is of course also possible with json format.) {code} <reRankModelFactory name="myTwoPassModelName" class="solr.MultiPassRerankModelFactory"> <str name="passPrefixes">simple,complex</str> <!-- simple model factory --> <str name="simple.class">solr.SVMRerankModelFactory</str> <!-- simple model features --> <str name="simple.features">originalScore,isBook</str> <str name="simple.originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name="simple.isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name="simple.isBook.fq">{!terms f=category}book</str> <!-- simple model parameters --> <float name="simple.weights.originalScore">0.5</float> <float name="simple.weights.isBook">0.1</float> <!-- complex model factory --> <str name="complex.class">mycompany.MyComplexRerankModelFactory</str> <!-- complex model features --> <str name="complex.features">x,y</str> <str name="complex.x.class">...</str> <str name="complex.x.aaa">...</str> <int name="complex.x.bbb">...</int> <str name="complex.y.class">...</str> <int name="complex.y.zzz">...</int> <!-- complex model parameters --> <float name="complex.something.configurable">0.42</float> ... </reRankModelFactory> {code} > Integrate Learning to Rank into Solr > ------------------------------------ > > Key: SOLR-8542 > URL: https://issues.apache.org/jira/browse/SOLR-8542 > Project: Solr > Issue Type: New Feature > Reporter: Joshua Pantony > Assignee: Christine Poerschke > Priority: Minor > Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, > SOLR-8542-trunk.patch > > > This is a ticket to integrate learning to rank machine learning models into > Solr. Solr Learning to Rank (LTR) provides a way for you to extract features > directly inside Solr for use in training a machine learned model. You can > then deploy that model to Solr and use it to rerank your top X search > results. This concept was previously presented by the authors at Lucene/Solr > Revolution 2015 ( > http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp > ). > The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, > David Grohmann and Diego Ceccarelli. > Any chance this could make it into a 5x release? We've also attached > documentation as a github MD file, but are happy to convert to a desired > format. > h3. Test the plugin with solr/example/techproducts in 6 steps > Solr provides some simple example of indices. In order to test the plugin > with > the techproducts example please follow these steps > h4. 1. compile solr and the examples > cd solr > ant dist > ant example > h4. 2. run the example > ./bin/solr -e techproducts > h4. 3. stop it and install the plugin: > > ./bin/solr stop > mkdir example/techproducts/solr/techproducts/lib > cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar > example/techproducts/solr/techproducts/lib/ > cp contrib/ltr/example/solrconfig.xml > example/techproducts/solr/techproducts/conf/ > h4. 4. run the example again > > ./bin/solr -e techproducts > h4. 5. index some features and a model > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore' > --data-binary "@./contrib/ltr/example/techproducts-features.json" -H > 'Content-type:application/json' > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore' > --data-binary "@./contrib/ltr/example/techproducts-model.json" -H > 'Content-type:application/json' > h4. 6. have fun ! > *access to the default feature store* > http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ > *access to the model store* > http://localhost:8983/solr/techproducts/schema/mstore > *perform a query using the model, and retrieve the features* > http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org