[ 
https://issues.apache.org/jira/browse/LUCENE-7828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012600#comment-16012600
 ] 

Alan Woodward commented on LUCENE-7828:
---------------------------------------

Yes, this looks like it only affects range fields, so the various shortcuts can 
be implemented in RangeFieldQuery's IntersectVisitor rather than changing the 
core API.  This change gives me a 35% speedup on this particular dataset:

{code}

 @@ -136,6 +136,11 @@ public void visit(int docID, byte[] leaf) throws 
IOException {
                }
                @Override
                public Relation compare(byte[] minPackedValue, byte[] 
maxPackedValue) {
 +                if (Arrays.equals(minPackedValue, maxPackedValue)) {
 +                  if (queryType == QueryType.CONTAINS && 
comparator.isWithin(minPackedValue)) {
 +                    return Relation.CELL_INSIDE_QUERY;
 +                  }
 +                }
                  byte[] node = getInternalRange(minPackedValue, 
maxPackedValue);
                  // compute range relation for BKD traversal
                  if (comparator.isDisjoint(node)) {
{code}

> Improve PointValues visitor calls when all docs in a leaf share a value
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-7828
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7828
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Nicholas Knize
>
> When all the docs in a leaf node have the same value, range queries can waste 
> a lot of processing if the node itself returns CELL_CROSSES_QUERY when 
> compare() is called, in effect performing the same calculation in visit(int, 
> byte[]) over and over again.  In the case I'm looking at (very low 
> cardinality indexed LongRange fields), this causes something of a perfect 
> storm for performance.  PointValues can detect up front if a given node has a 
> single value (because it's min value and max value will be equal), so this 
> case should be fairly simple to identify and shortcut.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to