[ https://issues.apache.org/jira/browse/LUCENE-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-5938: --------------------------------- Attachment: LUCENE-5938.patch Thanks for the review Mike, here is a new patch that tries to address your concerns. bq. Maybe fix "word" to be "long" instead Done. bq. do we need to guard against zeroWords being 0? I added a comment as well as a test as you suggested. {quote} Maybe add some more comments around tricky parts of SparseFixedBitSet. E.g., the different branches inside set? And, it looks strange doing 1L << i, but in fact the JVM/processor make that 1L << (i % 64). And Iterator.currentOrNextDoc is scary looking... do we have enough tests here? {quote} I added more comments, hopefully they make sense. Please let me know if there are still things that are not clear. currentOrNextDoc is a bit complicated because of the different cases that need to be handled but the structure is actually quite simple so at least get and set should be easy to understand. I extracted the insertion of a new long to a separate method, this should make set easier to read? Regarding the shift, indeed it relies on the fact that shifts are mod 64 (FixedBitSet does the same). I added some comments about it. Regarding the tests, BaseDocIdSetTestCase.testAgainstBitSet tests various densities and assertEquals checks nextDoc(), docId(), interleaved calls to nextDoc() and advance(), and that the oal.util.Bits representation is consistent with the iterator. I think that is good? bq. I hit this test failure, which reproduces with the patch I dug that one, and the reason is that the searcher is created with threads, so the exception is indeed wrapped into an ExecutionException, which is in turn wrapped into a RuntimeException to by-pass the fact that ExecutionException is checked. It doesn't reproduce on trunk because the default rewrite method reads data from the index in MultiTermQuery.rewrite (collectTerms) which does not run in a thread pool. You can reproduce the issue on trunk by setting the rewrite method of the term range query to {{CONSTANT_SCORE_FILTER_REWRITE}}. I fixed the test in the patch to walk down the causes of the exception that is thrown. > New DocIdSet implementation with random write access > ---------------------------------------------------- > > Key: LUCENE-5938 > URL: https://issues.apache.org/jira/browse/LUCENE-5938 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Attachments: LUCENE-5938.patch, LUCENE-5938.patch, LUCENE-5938.patch, > LUCENE-5938.patch, LUCENE-5938.patch, low_freq.tasks > > > We have a great cost API that is supposed to help make decisions about how to > best execute queries. However, due to the fact that several of our filter > implementations (eg. TermsFilter and BooleanFilter) return FixedBitSets, > either we use the cost API and make bad decisions, or need to fall back to > heuristics which are not as good such as > RandomAccessFilterStrategy.useRandomAccess which decides that random access > should be used if the first doc in the set is less than 100. > On the other hand, we also have some nice compressed and cacheable DocIdSet > implementation but we cannot make use of them because TermsFilter requires a > DocIdSet that has random write access, and FixedBitSet is the only DocIdSet > that we have that supports random access. > I think it would be nice to replace FixedBitSet in those filters with another > DocIdSet that would also support random write access but would have a better > cost? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org