[ 
https://issues.apache.org/jira/browse/SOLR-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913797#comment-13913797
 ] 

David commented on SOLR-5773:
-----------------------------

Actually I had to change my approach.

{code}
 public CollapsingScoreCollector(int maxDoc,
                                    int segments,
                                    SortedDocValues values,
                                    int nullPolicy,
                                    IntOpenHashSet boostDocs) {
      this.maxDoc = maxDoc;
      this.contexts = new AtomicReaderContext[segments];
      this.collapsedSet = new OpenBitSet(maxDoc);
      this.boostDocs = boostDocs;
      if(this.boostDocs != null) {
        //Set the elevated docs now.
        Iterator<IntCursor> it = this.boostDocs.iterator();
        while(it.hasNext()) {
          IntCursor cursor = it.next();
          this.collapsedSet.fastSet(cursor.value);
        }
      }
      this.values = values;
      int valueCount = values.getValueCount();
      this.ords = new int[valueCount];
      this.groupIsBoosted = new boolean[valueCount];
      Arrays.fill(this.ords, -1);
      this.scores = new float[valueCount];
      Arrays.fill(this.scores, -Float.MAX_VALUE);
      this.nullPolicy = nullPolicy;
      if(nullPolicy == CollapsingPostFilter.NULL_POLICY_EXPAND) {
        nullScores = new FloatArrayList();
      }
    }

    public boolean acceptsDocsOutOfOrder() {
      //Documents must be sent in order to this collector.
      return false;
    }

    public void setNextReader(AtomicReaderContext context) throws IOException {
      this.contexts[context.ord] = context;
      this.docBase = context.docBase;
    }

    public void collect(int docId) throws IOException {
      int globalDoc = docId+this.docBase;
      int ord = values.getOrd(globalDoc);
      
      
      if(ord > -1) {
        if (this.collapsedSet.fastGet(globalDoc)) {
          //If we have a document in the group that is potentially not 
          //the top scorer but also exists as an elevated document
          //set it as the globalDoc and it will be removed in
          //favor of the elevated document
          groupIsBoosted[ord] = true;
          ords[ord] = globalDoc;
        } else if (!groupIsBoosted[ord]) {
          float score = scorer.score();
          if(score > scores[ord]) {
            ords[ord] = globalDoc;
            scores[ord] = score;
          }
        }
      } if (this.collapsedSet.fastGet(globalDoc)) {
        //The doc is elevated so score does not matter
        //We just want to be sure it doesn't fall into the null policy
        ords[ord] = globalDoc;
      } else if(nullPolicy == CollapsingPostFilter.NULL_POLICY_COLLAPSE) {
        float score = scorer.score();
        if(score > nullScore) {
          nullScore = score;
          nullDoc = globalDoc;
        }
      } else if(nullPolicy == CollapsingPostFilter.NULL_POLICY_EXPAND) {
        collapsedSet.fastSet(globalDoc);
        nullScores.add(scorer.score());
      }
    }
{code}

This approach will work for default grouping. Will still have to implement 
fixes for min max grouping. I will probably also want to make this a 
toggle-able feature.

> CollapsingQParserPlugin problem with ElevateComponent
> -----------------------------------------------------
>
>                 Key: SOLR-5773
>                 URL: https://issues.apache.org/jira/browse/SOLR-5773
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>    Affects Versions: 4.6.1
>            Reporter: David
>              Labels: collapse, solr
>             Fix For: 4.7
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Hi Joel,
> I sent you an email but I'm not sure if you received it or not. I ran into a 
> bit of trouble using the CollapsingQParserPlugin with elevated documents. To 
> explain it simply, I want to exclude grouped documents when one of the 
> members of the group are contained in the elevated document set. I'm not sure 
> this is possible currently because as you explain above elevated documents 
> are added to the request context after the original query is constructed.
> To try to better illustrate the problem. If I have 2 documents docid=1 and 
> docid=2 and both have a groupid of 'a'. If a grouped query scores docid 2 
> first in the results but I have elevated docid 1 then both documents are 
> shown in the results when I really only want the elevated document to be 
> shown in the results.
> Is this something that would be difficult to implement? Any help is 
> appreciated.
> I think the solution would be to remove the documents from liveDocs that 
> share the same groupid in the getBoostDocs() function. Let me know if this 
> makes any sense. I'll continue working towards a solution in the meantime.
> {code}
> private IntOpenHashSet getBoostDocs(SolrIndexSearcher indexSearcher, 
> Set<String> boosted) throws IOException {
>       IntOpenHashSet boostDocs = null;
>       if(boosted != null) {
>         SchemaField idField = indexSearcher.getSchema().getUniqueKeyField();
>         String fieldName = idField.getName();
>         HashSet<BytesRef> localBoosts = new HashSet(boosted.size()*2);
>         Iterator<String> boostedIt = boosted.iterator();
>         while(boostedIt.hasNext()) {
>           localBoosts.add(new BytesRef(boostedIt.next()));
>         }
>         boostDocs = new IntOpenHashSet(boosted.size()*2);
>         List<AtomicReaderContext>leaves = 
> indexSearcher.getTopReaderContext().leaves();
>         TermsEnum termsEnum = null;
>         DocsEnum docsEnum = null;
>         for(AtomicReaderContext leaf : leaves) {
>           AtomicReader reader = leaf.reader();
>           int docBase = leaf.docBase;
>           Bits liveDocs = reader.getLiveDocs();
>           Terms terms = reader.terms(fieldName);
>           termsEnum = terms.iterator(termsEnum);
>           Iterator<BytesRef> it = localBoosts.iterator();
>           while(it.hasNext()) {
>             BytesRef ref = it.next();
>             if(termsEnum.seekExact(ref)) {
>               docsEnum = termsEnum.docs(liveDocs, docsEnum);
>               int doc = docsEnum.nextDoc();
>               if(doc != -1) {
>                 //Found the document.
>                 boostDocs.add(doc+docBase);
>                *// HERE REMOVE ANY DOCUMENTS THAT SHARE THE GROUPID NOT ONLY 
> THE DOCID //*
>                 it.remove();
>               }
>             }
>           }
>         }
>       }
>       return boostDocs;
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to