[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183195#comment-15183195
 ] 

Christine Poerschke commented on SOLR-8542:
-------------------------------------------

bq. ... Question: The only reason we currently have the LTRComponent is so that 
it can register the Model and Feature stores as managed resources because it 
can be SolrCore aware. Is there a way we can do this without the use of a 
component?

Not answering directly the managed resources part of the question but having 
noticed that the features.json/model.json needs to be accompanied by various 
solrconfig.xml changes in practice - I wonder if configuring models as plugin 
part of solrconfig.xml might be something to explore?
----
*current (features|model).json and solrconfig.xml configuration:*
{code}
###### features.json
...
###### firstModel.json
...
###### secondModel.json
...
###### solrconfig.xml
...
<queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
...
<transformer name="features" 
class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory"/>
...
<searchComponent name="ltrComponent" 
class="org.apache.solr.ltr.ranking.LTRComponent"/>
...
<requestHandler name="/query" class="solr.SearchHandler">
  ...
  <arr name="last-components">
    <str>ltrComponent</str>
  </arr>
</requestHandler>
...
{code}
----
*potential alternative solrconfig.xml configuration:*
{code}
###### solrconfig.xml
...
<!-- no queryParser name="ltr" element since LTRQParserPlugin is in 
QParserPlugin.standardPlugins -->
<!-- no transformer name="features" since LTRFeatureLoggerTransformerFactory is 
in TransformerFactory.defaultFactories -->

<reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory">
  <!-- model features -->
  <str name="features">originalScore,isBook</str>
  <str 
name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
  <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
  <str name="isBook.fq">{!terms f=category}book</str>
  <!-- model parameters -->
  <float name="weights.originalScore">0.5</float>
  <float name="weights.isBook">0.1</float>
</reRankModelFactory>

<reRankModelFactory class="solr.SVMRerankModelFactory">
  <str name="">mySecondModelName</str>
  ...
</reRankModelFactory>
...
{code}
----
_The most obvious implication_ of having a new solrconfig.xml element instead 
of (features|model).json managed resources would be that {{solr/core}} rather 
than {{solr/contrib/ltr}} contains the code.
* From an end-user perspective this means 'Learning to Rank' support 
out-of-the-box i.e. no need to build and deploy extra jar files plus no need to 
configure LTRQParserPlugin and LTRFeatureLoggerTransformerFactory queryParser 
and transformer elements. Though note that {{<reRankModelFactory 
class="mycompany.MyCustomReRankModelFactory">}} customisation is supported if 
something other than the out-of-the-box models is required.
* One of the out-of-the-box factories could be a features-only factory similar 
to the 'dummyModel' mentioned above, e.g.
{code}
<reRankModelFactory name="featuresOnly" class="solr.NoRerankingFactory">
  <str name="features">originalScore,isBook</str>
  <str 
name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
  <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
  <str name="isBook.fq">{!terms f=category}book</str>
</reRankModelFactory>
{code}

_A concern might be_ that the reRankModelFactory element(s) would bloat 
solrconfig.xml and that the element(s) being embedded in solrconfig.xml would 
be more difficult to edit than one or two json files.
* The bloat concern can be addressed via {{xi:include}} e.g.
{code}
###### solrconfig.xml
...
<xi:include href="solrconfig-reRankModelFactory-myFirstModelName.xml" 
xmlns:xi="http://www.w3.org/2001/XInclude"/>
...
###### solrconfig-reRankModelFactory-myFirstModelName.xml
<reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory">
  <!-- model features -->
  <str name="features">originalScore,isBook</str>
  <str 
name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
  <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
  <str name="isBook.fq">{!terms f=category}book</str>
  <!-- model parameters -->
  <float name="weights.originalScore">0.5</float>
  <float name="weights.isBook">0.1</float>
</reRankModelFactory>
{code}
* xml vs. json representation is a fair point, if the feature engineering 
process usually outputs json files then perhaps a simple utility script could 
help convert that json into solrconfig.xml a reRankModelFactory xml element.

_A factory approach_ could naturally support arbitrary models including 
chaining or nesting of models. (A factory approach is of course also possible 
with json format.)
{code}
<reRankModelFactory name="myTwoPassModelName" 
class="solr.MultiPassRerankModelFactory">
  <str name="passPrefixes">simple,complex</str>

  <!-- simple model factory -->
  <str name="simple.class">solr.SVMRerankModelFactory</str>
  <!-- simple model features -->
  <str name="simple.features">originalScore,isBook</str>
  <str 
name="simple.originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
  <str 
name="simple.isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
  <str name="simple.isBook.fq">{!terms f=category}book</str>
  <!-- simple model parameters -->
  <float name="simple.weights.originalScore">0.5</float>
  <float name="simple.weights.isBook">0.1</float>

  <!-- complex model factory -->
  <str name="complex.class">mycompany.MyComplexRerankModelFactory</str>
  <!-- complex model features -->
  <str name="complex.features">x,y</str>
  <str name="complex.x.class">...</str>
  <str name="complex.x.aaa">...</str>
  <int name="complex.x.bbb">...</int>
  <str name="complex.y.class">...</str>
  <int name="complex.y.zzz">...</int>
  <!-- complex model parameters -->
  <float name="complex.something.configurable">0.42</float>
  ...
</reRankModelFactory>
{code}

> Integrate Learning to Rank into Solr
> ------------------------------------
>
>                 Key: SOLR-8542
>                 URL: https://issues.apache.org/jira/browse/SOLR-8542
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joshua Pantony
>            Assignee: Christine Poerschke
>            Priority: Minor
>         Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>    
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>     
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to