[ https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178779#comment-15178779 ]
Michael Nilsson commented on SOLR-8542: --------------------------------------- Hey Christine, I've posted a response to most of your comments thus far below. *doDeleteChild method makes no storeManagedData method call* We have a ticket for this that we'll fix along with other improvements for our next commit. *ManagedFeatureStore.doGet throws an exception when the childId concerned is not present* We could return a response with no features if desired, we were currently using the error response to differentiate between a feature store not existing and one existing without any features added to it yet. *ManagedResource.doPut addFeature could throw an exception when a name being updated/added already exists. Should repeats of the same name simply replace the existing entry for that name?* Typically when you have models deployed using some features, you don't want to "update" an existing feature. You should instead add a new feature with your updates and deploy a newly trained model using it, because you don't want the meaning/value of the original feature used by historical models to change. This is to ensure reproducible results when testing an old model that used the old version of the feature. We use this error to prevent this from happening. *LTRComponent state + use of state separation. Would feature store and model store changes still propagate through to ltr_ms* If you deploy new features to your feature store, you would want to start extracting those features, which means we should propagate them down. We could make feature stores write-once, and any new features would require a new feature store with all the old ones copied over to avoid this, but that might be cumbersome to the user and leave lots of old feature stores around until the user cleans them up. Question: The only reason we currently have the LTRComponent is so that it can register the Model and Feature stores as managed resources because it can be SolrCore aware. Is there a way we can do this without the use of a component? *Branch/commit process* Everything you said sounds do-able. The only question I have is regarding "'git merge' and 'git rebase' and 'git --force push' will be avoided". Agreed about git force, but if at the end we're going to make a new master-ltr-plugin-rfc-march branch, and everything is going to be squashed and rebased, why not allow merges into the master-ltr-plugin-rfc to keep up to date with master changes instead of cherry-picking everything one by one into it? *Feature engineering dummy model replacement* Currently you have to use a dummy model to reference what features you want extracted like you said. {code}fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25}{code} The only reason you need the model is because it has a FeatureStore, which has all the features you are looking to extract. Instead, we are planning on allowing you to specify which FeatureStore you want to use for feature extraction directly in the features Document Transformer. We will also remove the superfluous fv=true parameter, since the document transformer already identifies the fact that you want to extract features. The new expected sample request for feature extraction would probably look something like this instead {code}fl=*,score,[features featureStore=MyFeatures]{code} *would the efi. parameters move out of the rq* We will probably also move efi out as well, since you need them for both feature extraction and reranking with a model *might it be useful to have optional version and/or comment string elements in the feature* I think the comment section would be a good idea. The version touches on the what I mentioned earlier about updates vs adds. We'll have to think about the best way to handle this since you don't want to lose/replace versions 1 and 2 when you deploy version 3 of a feature. *Could you clarify/outline when/how the "store" element would be used?* A FeatureStore is a list of features that you want to extract (and use for training, logging, or in a model for reranking). In the majority of the cases, you will probably just have 1 feature store, and all iterations of your models will use the same feature store, with any new features added to the store. A model cannot use features from other stores. It may be the case that a single collection services many different applications. If each of those applications wants to rerank its results differently and only cares about a subset of features, then they could each make their own FeatureStores with their say 100 features for extraction instead of pulling out the thousands of other features that all the other teams made for that same collection. *Are feature and model stores local to each solr config or can they be shared across configs?* The feature and model stores are currently tied locally to each collection/config, like managed stopwords/synonyms. If you wanted to have comparable scores for searches across multiple collections for a unified search list, you have to deploy that model to each of the collections. > Integrate Learning to Rank into Solr > ------------------------------------ > > Key: SOLR-8542 > URL: https://issues.apache.org/jira/browse/SOLR-8542 > Project: Solr > Issue Type: New Feature > Reporter: Joshua Pantony > Assignee: Christine Poerschke > Priority: Minor > Attachments: README.md, README.md, SOLR-8542-branch_5x.patch, > SOLR-8542-trunk.patch > > > This is a ticket to integrate learning to rank machine learning models into > Solr. Solr Learning to Rank (LTR) provides a way for you to extract features > directly inside Solr for use in training a machine learned model. You can > then deploy that model to Solr and use it to rerank your top X search > results. This concept was previously presented by the authors at Lucene/Solr > Revolution 2015 ( > http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp > ). > The attached code was jointly worked on by Joshua Pantony, Michael Nilsson, > David Grohmann and Diego Ceccarelli. > Any chance this could make it into a 5x release? We've also attached > documentation as a github MD file, but are happy to convert to a desired > format. > h3. Test the plugin with solr/example/techproducts in 6 steps > Solr provides some simple example of indices. In order to test the plugin > with > the techproducts example please follow these steps > h4. 1. compile solr and the examples > cd solr > ant dist > ant example > h4. 2. run the example > ./bin/solr -e techproducts > h4. 3. stop it and install the plugin: > > ./bin/solr stop > mkdir example/techproducts/solr/techproducts/lib > cp build/contrib/ltr/lucene-ltr-6.0.0-SNAPSHOT.jar > example/techproducts/solr/techproducts/lib/ > cp contrib/ltr/example/solrconfig.xml > example/techproducts/solr/techproducts/conf/ > h4. 4. run the example again > > ./bin/solr -e techproducts > h4. 5. index some features and a model > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/fstore' > --data-binary "@./contrib/ltr/example/techproducts-features.json" -H > 'Content-type:application/json' > curl -XPUT 'http://localhost:8983/solr/techproducts/schema/mstore' > --data-binary "@./contrib/ltr/example/techproducts-model.json" -H > 'Content-type:application/json' > h4. 6. have fun ! > *access to the default feature store* > http://localhost:8983/solr/techproducts/schema/fstore/_DEFAULT_ > *access to the model store* > http://localhost:8983/solr/techproducts/schema/mstore > *perform a query using the model, and retrieve the features* > http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=svm%20reRankDocs=25%20efi.query=%27test%27}&fl=*,[features],price,score,name&fv=true -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org