[ 
https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613403#comment-14613403
 ] 

Michael McCandless commented on LUCENE-6645:
--------------------------------------------

Thanks [~jpountz], this is impressive!  A radix sorter, an Adder abstraction to 
skip hot loop if statements... looks like you had fun with hotspot :)

Did the Adder make a substantial improvement over the more straightforward 
if-per-add?  Maybe we could just add a .grow which would pre-grow the int[] if 
necessary ... not sure.

Most of the changes are in DocIdSetBuilder, and less so in BKDTreeReader.  You 
might get a bit more speedup if instead of {{prepareAdd(1)}}, you did the 
{{prepareAdd}} up front outside the loop with the worst case {{count}} (number 
of docs in the leaf block)?  I.e. this would reserve for the worse case (all 
docs pass the filter).  These leaf blocks are smallish by default (up to 1024 
docs), and worst case is this upgrades to a bitset a bit early?  Not sure it 
will help that much, since most of the cells are visited via addAll...

In each cell of the BKD tree the docIDs are sorted ... but we don't take 
advantage of that here (not sure how we would).

Maybe we should try to contain the added hairiness to BKDTreeReader instead of 
DocIdSetBuilder if indeed this is the only user of this API that is so strange 
(or of tons of tiny sorted docID blocks)


> BKD tree queries should use BitDocIdSet.Builder
> -----------------------------------------------
>
>                 Key: LUCENE-6645
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6645
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-6645.patch, LUCENE-6645.patch
>
>
> When I was iterating on BKD tree originally I remember trying to use this 
> builder (which makes a sparse bit set at first and then upgrades to dense if 
> enough bits get set) and being disappointed with its performance.
> I wound up just making a FixedBitSet every time, but this is obviously 
> wasteful for small queries.
> It could be the perf was poor because I was always .or'ing in DISIs that had 
> 512 - 1024 hits each time (the size of each leaf cell in the BKD tree)?  I 
> also had to make my own DISI wrapper around each leaf cell... maybe that was 
> the source of the slowness, not sure.
> I also sort of wondered whether the SmallDocSet in spatial module (backed by 
> a SentinelIntSet) might be faster ... though it'd need to be sorted in the 
> and after building before returning to Lucene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to