[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

GitBox Thu, 05 Nov 2020 09:26:15 -0800


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r518217755




##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/response/transform/LTRFeatureLoggerTransformerFactory.java
##########
@@ -210,50 +216,59 @@ public void setContext(ResultContext context) {
       }
       
       // Setup LTRScoringQuery
-      scoringQuery = SolrQueryRequestContextUtils.getScoringQuery(req);
-      docsWereNotReranked = (scoringQuery == null);
-      String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
-      if (docsWereNotReranked || (featureStoreName != null && 
(!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()))))
 {
-        // if store is set in the transformer we should overwrite the logger
-
-        final ManagedFeatureStore fr = 
ManagedFeatureStore.getManagedFeatureStore(req.getCore());
-
-        final FeatureStore store = fr.getFeatureStore(featureStoreName);
-        featureStoreName = store.getName(); // if featureStoreName was null 
before this gets actual name
-
-        try {
-          final LoggingModel lm = new LoggingModel(loggingModelName,
-              featureStoreName, store.getFeatures());
-
-          scoringQuery = new LTRScoringQuery(lm,
-              LTRQParserPlugin.extractEFIParams(localparams),
-              true,
-              threadManager); // request feature weights to be created for all 
features
-
-        }catch (final Exception e) {
-          throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
-              "retrieving the feature store "+featureStoreName, e);
-        }
-      }
+      rerankingQueries = SolrQueryRequestContextUtils.getScoringQueries(req);
 
-      if (scoringQuery.getOriginalQuery() == null) {
-        scoringQuery.setOriginalQuery(context.getQuery());
+      docsWereNotReranked = (rerankingQueries == null || 
rerankingQueries.length == 0);
+      if (docsWereNotReranked) {
+        rerankingQueries = new LTRScoringQuery[]{null};
       }
-      if (scoringQuery.getFeatureLogger() == null){
-        scoringQuery.setFeatureLogger( 
SolrQueryRequestContextUtils.getFeatureLogger(req) );
-      }
-      scoringQuery.setRequest(req);
-
-      featureLogger = scoringQuery.getFeatureLogger();
+      modelWeights = new LTRScoringQuery.ModelWeight[rerankingQueries.length];
+      String featureStoreName = 
SolrQueryRequestContextUtils.getFvStoreName(req);
+      for (int i = 0; i < rerankingQueries.length; i++) {
+        LTRScoringQuery scoringQuery = rerankingQueries[i];
+        if ((scoringQuery == null || !(scoringQuery instanceof 
OriginalRankingLTRScoringQuery)) && (docsWereNotReranked || (featureStoreName 
!= null && 
!featureStoreName.equals(scoringQuery.getScoringModel().getFeatureStoreName()))))
 {

Review comment:
       12/n observations/thoughts/questions:
   
   Most tricky to articulate, hence left until last.
   
   Prior to interleaving the existing logic is that if feature vectors are 
requested and there is no model (or the model is for a different feature store) 
then a logging model is created.
   
   So now then if we have two models:
   * if both models are for the requested feature store then that's great and 
each document would have been picked by one of the models and so we use the 
feature vector already previously calculated by whatever model had picked the 
document.
   * if neither model is for the requested feature store then we need to create 
a logging model, is one logging model sufficient or do we need two? intuitively 
to me one would seem to be sufficient but that's based on partial analysis only 
so far.
   * if one of the two models (modelA) is for the requested feature store then 
for the documents picked by modelA we can use the feature vector already 
previously calculated by modelA. what about documents picked by modelB? it 
could be that modelA actually has the feature vector for that document but that 
modelB simply managed to pick the document first. or if modelA does not have 
the feature vector then we could calculate it for modelA. would a logging model 
help in this scenario? intuitively to me it would seem that calculating the 
missing feature vector via modelA or via the logging model would both be 
equally efficient and hence no logging model would be needed but again that's 
only based on partial analysis so far.

##########
File path: solr/solr-ref-guide/src/learning-to-rank.adoc
##########
@@ -247,6 +254,81 @@ The output XML will include feature values as a 
comma-separated list, resembling
   }}
 ----
 
+=== Running a Rerank Query Interleaving Two Models
+
+To rerank the results of a query, interleaving two models (myModelA, myModelB) 
add the `rq` parameter to your search, passing two models in input, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score
+
+To obtain the model that interleaving picked for a search result, computed 
during reranking, add `[interleaving]` to the `fl` parameter, for example:
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModelA 
model=myModelB reRankDocs=100}&fl=id,score,[interleaving]
+
+The output XML will include the model picked for each search result, 
resembling the output shown here:
+
+[source,json]
+----
+{
+  "responseHeader":{
+    "status":0,
+    "QTime":0,
+    "params":{
+      "q":"test",
+      "fl":"id,score,[interleaving]",
+      "rq":"{!ltr model=myModelA model=myModelB reRankDocs=100}"}},
+  "response":{"numFound":2,"start":0,"maxScore":1.0005897,"docs":[
+      {
+        "id":"GB18030TEST",
+        "score":1.0005897,
+        "[interleaving]":"myModelB"},
+      {
+        "id":"UTF8TEST",
+        "score":0.79656565,
+        "[interleaving]":"myModelA"}]
+  }}
+----
+
+=== Running a Rerank Query Interleaving a model with the original ranking
+When approaching Search Quality Evaluation with interleaving it may be useful 
to compare a model with the original ranking. 
+To rerank the results of a query, interleaving a model with the original 
ranking, add the `rq` parameter to your search, with a model in input and 
activating the original ranking interleaving, for example:
+
+
+[source,text]
+http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr model=myModel 
interleaveOriginalRanking=true reRankDocs=100}&fl=id,score

Review comment:
       13/n minor: `interleaveOriginalRanking=true` --> 
`model=_OriginalRanking_` assuming we're going with that special value




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

Reply via email to