[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

GitBox Thu, 12 Nov 2020 15:56:03 -0800


cpoerschke commented on a change in pull request #1571:
URL: https://github.com/apache/lucene-solr/pull/1571#discussion_r522515743




##########
File path: 
solr/contrib/ltr/src/java/org/apache/solr/ltr/interleaving/TeamDraftInterleaving.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.ltr.interleaving;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.LinkedHashSet;
+import java.util.Random;
+import java.util.Set;
+
+import org.apache.lucene.search.ScoreDoc;
+
+/**
+ * Interleaving was introduced the first time by Joachims in [1, 2].
+ * Team Draft Interleaving is among the most successful and used interleaving 
approaches[3].
+ * Here the authors implement a method similar to the way in which captains 
select their players in team-matches.
+ * Team Draft Interleaving produces a fair distribution of ranking models’ 
elements in the final interleaved list.
+ * It has also proved to overcome an issue of the previous implemented 
approach, Balanced interleaving, in determining the winning model[4].
+ * <p>
+ * [1] T. Joachims. Optimizing search engines using clickthrough data. KDD 
(2002)
+ * [2] 
T.Joachims.Evaluatingretrievalperformanceusingclickthroughdata.InJ.Franke, G. 
Nakhaeizadeh, and I. Renz, editors,
+ * Text Mining, pages 79–96. Physica/Springer (2003)
+ * [3] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data 
reflect re-
+ * trieval quality? In CIKM, pages 43–52. ACM Press (2008)
+ * [4] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+ * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+ */
+public class TeamDraftInterleaving implements Interleaving{
+  public static Random RANDOM;
+
+  static {
+    // We try to make things reproducible in the context of our tests by 
initializing the random instance
+    // based on the current seed
+    String seed = System.getProperty("tests.seed");
+    if (seed == null) {
+      RANDOM = new Random();
+    } else {
+      RANDOM = new Random(seed.hashCode());
+    }
+  }
+
+  /**
+   * Team Draft Interleaving considers two ranking models: modelA and modelB.
+   * For a given query, each model returns its ranked list of documents La = 
(a1,a2,...) and Lb = (b1, b2, ...).
+   * The algorithm creates a unique ranked list I = (i1, i2, ...).
+   * This list is created by interleaving elements from the two lists la and 
lb as described by Chapelle et al.[1].
+   * Each element Ij is labelled TeamA if it is selected from La and TeamB if 
it is selected from Lb.
+   * <p>
+   * [1] O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue.
+   * Large-scale validation and analysis of interleaved search evaluation. ACM 
TOIS, 30(1):1–41, Feb. (2012)
+   * <p>
+   * Assumptions:
+   * - rerankedA and rerankedB has the same length.
+   * They contains the same search results, ranked differently by two ranking 
models
+   * - each reranked list can not contain the same search result more than 
once.
+   *
+   * @param rerankedA a ranked list of search results produced by a ranking 
model A
+   * @param rerankedB a ranked list of search results produced by a ranking 
model B
+   * @return the interleaved ranking list
+   */
+  public InterleavingResult interleave(ScoreDoc[] rerankedA, ScoreDoc[] 
rerankedB) {
+    LinkedHashSet<ScoreDoc> interleavedResults = new LinkedHashSet<>();
+    ScoreDoc[] interleavedResultArray = new ScoreDoc[rerankedA.length];
+    ArrayList<Set<Integer>> interleavingPicks = new ArrayList<>(2);
+    Set<Integer> teamA = new HashSet<>();
+    Set<Integer> teamB = new HashSet<>();
+    int topN = rerankedA.length;
+    int indexA = 0, indexB = 0;
+
+    while (interleavedResults.size() < topN && indexA < rerankedA.length && 
indexB < rerankedB.length) {
+      if(teamA.size()<teamB.size() || (teamA.size()==teamB.size() && 
!RANDOM.nextBoolean())){
+        indexA = updateIndex(interleavedResults,indexA,rerankedA);
+        interleavedResults.add(rerankedA[indexA]);
+        teamA.add(rerankedA[indexA].doc);
+        indexA++;
+      } else{
+        indexB = updateIndex(interleavedResults,indexB,rerankedB);
+        interleavedResults.add(rerankedB[indexB]);
+        teamB.add(rerankedB[indexB].doc);
+        indexB++;
+      }
+    }
+    interleavingPicks.add(teamA);
+    interleavingPicks.add(teamB);
+    interleavedResultArray = 
interleavedResults.toArray(interleavedResultArray);
+
+    return new InterleavingResult(interleavedResultArray,interleavingPicks);
+  }
+
+  private int updateIndex(LinkedHashSet<ScoreDoc> interleaved, int index, 
ScoreDoc[] reranked) {
+    boolean foundElementToAdd = false;
+    while (index < reranked.length && !foundElementToAdd) {
+      ScoreDoc elementToCheck = reranked[index];
+      if (interleaved.contains(elementToCheck)) {

Review comment:
       > ... currently Interleaving doesn't support sharding ...
   
   Let's include that in the documentation somehow, e.g. 
https://github.com/cpoerschke/lucene-solr/commit/7edd6005d6daa29b5a80bfd2a0e1249c23bd9279
 say.
   
   > ... So I rolled back the ScoreDoc ... Let me know if it is better!
   
   Yes it is, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #1571: SOLR-14560: Interleaving for Learning To Rank

Reply via email to