[jira] [Updated] (LUCENE-5938) New DocIdSet implementation with random write access

Adrien Grand (JIRA) Mon, 29 Sep 2014 15:59:59 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-5938:
---------------------------------
    Attachment: LUCENE-5938.patch

Thanks for the review Mike, here is a new patch that tries to address your 
concerns.

bq. Maybe fix "word" to be "long" instead

Done.

bq. do we need to guard against zeroWords being 0?

I added a comment as well as a test as you suggested.

{quote}
Maybe add some more comments around tricky parts of SparseFixedBitSet.
E.g., the different branches inside set? And, it looks strange doing
1L << i, but in fact the JVM/processor make that 1L << (i % 64). And
Iterator.currentOrNextDoc is scary looking... do we have enough tests
here?
{quote}

I added more comments, hopefully they make sense. Please let me know if there 
are still things that are not clear. currentOrNextDoc is a bit complicated 
because of the different cases that need to be handled but the structure is 
actually quite simple so at least get and set should be easy to understand. I 
extracted the insertion of a new long to a separate method, this should make 
set easier to read?

Regarding the shift, indeed it relies on the fact that shifts are mod 64 
(FixedBitSet does the same). I added some comments about it.

Regarding the tests, BaseDocIdSetTestCase.testAgainstBitSet tests various 
densities and assertEquals checks nextDoc(), docId(), interleaved calls to 
nextDoc() and advance(), and that the oal.util.Bits representation is 
consistent with the iterator. I think that is good?

bq. I hit this test failure, which reproduces with the patch

I dug that one, and the reason is that the searcher is created with threads, so 
the exception is indeed wrapped into an ExecutionException, which is in turn 
wrapped into a RuntimeException to by-pass the fact that ExecutionException is 
checked. It doesn't reproduce on trunk because the default rewrite method reads 
data from the index in MultiTermQuery.rewrite (collectTerms) which does not run 
in a thread pool. You can reproduce the issue on trunk by setting the rewrite 
method of the term range query to {{CONSTANT_SCORE_FILTER_REWRITE}}. I fixed 
the test in the patch to walk down the causes of the exception that is thrown.

> New DocIdSet implementation with random write access
> ----------------------------------------------------
>
>                 Key: LUCENE-5938
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5938
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-5938.patch, LUCENE-5938.patch, LUCENE-5938.patch, 
> LUCENE-5938.patch, LUCENE-5938.patch, low_freq.tasks
>
>
> We have a great cost API that is supposed to help make decisions about how to 
> best execute queries. However, due to the fact that several of our filter 
> implementations (eg. TermsFilter and BooleanFilter) return FixedBitSets, 
> either we use the cost API and make bad decisions, or need to fall back to 
> heuristics which are not as good such as 
> RandomAccessFilterStrategy.useRandomAccess which decides that random access 
> should be used if the first doc in the set is less than 100.
> On the other hand, we also have some nice compressed and cacheable DocIdSet 
> implementation but we cannot make use of them because TermsFilter requires a 
> DocIdSet that has random write access, and FixedBitSet is the only DocIdSet 
> that we have that supports random access.
> I think it would be nice to replace FixedBitSet in those filters with another 
> DocIdSet that would also support random write access but would have a better 
> cost?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5938) New DocIdSet implementation with random write access

Reply via email to