[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

Michael Nilsson (JIRA) Thu, 03 Mar 2016 14:59:31 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178779#comment-15178779
 ]


Michael Nilsson commented on SOLR-8542:
---------------------------------------

Hey Christine, I've posted a response to most of your comments thus far below.

*doDeleteChild method makes no storeManagedData method call*
We have a ticket for this that we'll fix along with other improvements for our 
next commit.

*ManagedFeatureStore.doGet throws an exception when the childId concerned is 
not present*
We could return a response with no features if desired, we were currently using 
the error response to differentiate between a feature store not existing and 
one existing without any features added to it yet.

*ManagedResource.doPut addFeature could throw an exception when a name being 
updated/added already exists.  Should repeats of the same name simply replace 
the existing entry for that name?*
Typically when you have models deployed using some features, you don't want to 
"update" an existing feature. You should instead add a new feature with your 
updates and deploy a newly trained model using it, because you don't want the 
meaning/value of the original feature used by historical models to change.  
This is to ensure reproducible results when testing an old model that used the 
old version of the feature.  We use this error to prevent this from happening.

*LTRComponent state + use of state separation. Would feature store and model 
store changes still propagate through to ltr_ms*
If you deploy new features to your feature store, you would want to start 
extracting those features, which means we should propagate them down.  We could 
make feature stores write-once, and any new features would require a new 
feature store with all the old ones copied over to avoid this, but that might 
be cumbersome to the user and leave lots of old feature stores around until the 
user cleans them up.
Question: The only reason we currently have the LTRComponent is so that it can 
register the Model and Feature stores as managed resources because it can be 
SolrCore aware.  Is there a way we can do this without the use of a component?

*Branch/commit process*
Everything you said sounds do-able.  The only question I have is regarding 
"'git merge' and 'git rebase' and 'git --force push' will be avoided".  Agreed 
about git force, but if at the end we're going to make a new 
master-ltr-plugin-rfc-march branch, and everything is going to be squashed and 
rebased, why not allow merges into the master-ltr-plugin-rfc to keep up to date 
with master changes instead of cherry-picking everything one by one into it?

*Feature engineering dummy model replacement*
Currently you have to use a dummy model to reference what features you want 
extracted like you said.
{code}fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel 
reRankDocs=25}{code}
The only reason you need the model is because it has a FeatureStore, which has 
all the features you are looking to extract.  Instead, we are planning on 
allowing you to specify which FeatureStore you want to use for feature 
extraction directly in the features Document Transformer.  We will also remove 
the superfluous fv=true parameter, since the document transformer already 
identifies the fact that you want to extract features.  The new expected sample 
request for feature extraction would probably look something like this instead
{code}fl=*,score,[features featureStore=MyFeatures]{code}

*would the efi. parameters move out of the rq*
We will probably also move efi out as well, since you need them for both 
feature extraction and reranking with a model

*might it be useful to have optional version and/or comment string elements in 
the feature*
I think the comment section would be a good idea.  The version touches on the 
what I mentioned earlier about updates vs adds.  We'll have to think about the 
best way to handle this since you don't want to lose/replace versions 1 and 2 
when you deploy version 3 of a feature.

*Could you clarify/outline when/how the "store" element would be used?*
A FeatureStore is a list of features that you want to extract (and use for 
training, logging, or in a model for reranking).  In the majority of the cases, 
you will probably just have 1 feature store, and all iterations of your models 
will use the same feature store, with any new features added to the store.  A 
model cannot use features from other stores.  It may be the case that a single 
collection services many different applications.  If each of those applications 
wants to rerank its results differently and only cares about a subset of 
features, then they could each make their own FeatureStores with their say 100 
features for extraction instead of pulling out the thousands of other features 
that all the other teams made for that same collection.

*Are feature and model stores local to each solr config or can they be shared 
across configs?*
The feature and model stores are currently tied locally to each 
collection/config, like managed stopwords/synonyms.  If you wanted to have 
comparable scores for searches across multiple collections for a unified search 
list, you have to deploy that model to each of the collections.



> Integrate Learning to Rank into Solr
> ------------------------------------
>
>                 Key: SOLR-8542
>                 URL: https://issues.apache.org/jira/browse/SOLR-8542
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joshua Pantony
>            Assignee: Christine Poerschke
>            Priority: Minor
>         Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, 
> SOLR-8542-trunk.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously presented by the authors at Lucene/Solr 
> Revolution 2015 ( 
> http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp
>  ).
> The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, 
> David Grohmann and Diego Ceccarelli.
> Any chance this could make it into a 5x release? We've also attached 
> documentation as a github MD file, but are happy to convert to a desired 
> format.
> h3. Test the plugin with solr/example/techproducts in 6 steps
> Solr provides some simple example of indices. In order to test the plugin 
> with 
> the techproducts example please follow these steps
> h4. 1. compile solr and the examples 
> cd solr
> ant dist
> ant example
> h4. 2. run the example
> ./bin/solr -e techproducts 
> h4. 3. stop it and install the plugin:
>    
> ./bin/solr stop
> mkdir example/techproducts/solr/techproducts/lib
> cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar 
> example/techproducts/solr/techproducts/lib/
> cp contrib/ltr/example/solrconfig.xml 
> example/techproducts/solr/techproducts/conf/
> h4. 4. run the example again
>     
> ./bin/solr -e techproducts
> h4. 5. index some features and a model
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore'  
> --data-binary "@./contrib/ltr/example/techproducts-features.json"  -H 
> 'Content-type:application/json'
> curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore'  
> --data-binary "@./contrib/ltr/example/techproducts-model.json"  -H 
> 'Content-type:application/json'
> h4. 6. have fun !
> *access to the default feature store*
> http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ 
> *access to the model store*
> http://localhost:8983/solr/techproducts/schema/mstore
> *perform a query using the model, and retrieve the features*
> http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

Reply via email to