[jira] [Commented] (LUCENE-7254) DocIDSetBuilder is no good for points

Robert Muir (JIRA) Tue, 26 Apr 2016 02:04:47 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257785#comment-15257785
 ]


Robert Muir commented on LUCENE-7254:
-------------------------------------

{quote}
+1 I don't like that this patch might create iterators over sparse FixedBitSet 
instances. I am fine with doing that temporarily for queries that are likely to 
match many docs (I see that you modified the ranges but not the point-in-set 
queries for instance) but in the longer term I think we should improve points 
so that we can know earlier how many docs are going to be added.
{quote}

No, it is the opposite way around. The sparse case is not the case to optimize 
because it is already fast.

not doing point-in-set had nothing to do with that. I just don't have a good 
benchmark for it. I think we should use the fastest bitset always here for 
these queries.

Optimizations for esoteric/abuse/etc cases (many values in a structured field, 
sparse fields) shouldnt drag down the hotspot of these searches for the common 
case.


> DocIDSetBuilder is no good for points
> -------------------------------------
>
>                 Key: LUCENE-7254
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7254
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-7254.patch, LUCENE-7254.patch
>
>
> For the postings lists, I think this approach works well in dense cases (e.g. 
> whole DISI's are added, things are coming in order, etc).
> However in the points case, it holds back range performance significantly. 
> There are a couple of problems here:
> * expensive cardinality computation (this is a 2% hit) when its totally 
> unnecessary. we can use index statistics to help here.
> * lots of conditional stuff in add(). This includes growing checks / bitset 
> switching checks and so on (which happens even if you are smart and call 
> grow, but this stuff all adds up). 
> I dont think we should try to create a magical shared API that is both 
> efficient for postings lists of unstructured stuff and at the same time point 
> collection for structured fields, instead we should just do things 
> differently for points and iterate from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7254) DocIDSetBuilder is no good for points

Reply via email to