[ https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708113#comment-13708113 ]
Adrien Grand commented on LUCENE-5101: -------------------------------------- bq. Do WAH8 and PFOR already have an index? They do, but the index is naive: it is a plain binary search over a subset of the (docID,position) pairs contained in the set. With the first versions of these DocIdSets, I just wanted to guarantee O(log(cardinality)) advance performance. bq. Block decoding might still be added to EliasFano, which should improve its nextDoc() performance The main use-case I see for these sets is to be used as filters. So I think advance() performance is more important? bq. The Elias-Fano code is not tuned yet, so I'm surprised that the Elias-Fano time for nextDoc() is less than a factor two worse than PFOR. Well, the PFOR doc ID set is not tuned either. :-) But I agree this is a good surprise for the Elias-Fano set. I mean even the WAH8 doc id set should be pretty fast and is still slower than the Elias-Fano set. bq. Another surprise is that Elias-Fano is best at advance() among the compressed sets for some cases. That means that Long.bitCount() is doing well on the upper bits then. I'm looking forward for the index. :-) bq. For bit densities > 1/2 there is clear need for WAH8 and Elias-Fano to be able to encode the inverse set. Could that be done by a common wrapper? I guess so. > make it easier to plugin different bitset implementations to > CachingWrapperFilter > --------------------------------------------------------------------------------- > > Key: LUCENE-5101 > URL: https://issues.apache.org/jira/browse/LUCENE-5101 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Attachments: LUCENE-5101.patch > > > Currently this is possible, but its not so friendly: > {code} > protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) > throws IOException { > if (docIdSet == null) { > // this is better than returning null, as the nonnull result can be > cached > return EMPTY_DOCIDSET; > } else if (docIdSet.isCacheable()) { > return docIdSet; > } else { > final DocIdSetIterator it = docIdSet.iterator(); > // null is allowed to be returned by iterator(), > // in this case we wrap with the sentinel set, > // which is cacheable. > if (it == null) { > return EMPTY_DOCIDSET; > } else { > /* INTERESTING PART */ > final FixedBitSet bits = new FixedBitSet(reader.maxDoc()); > bits.or(it); > return bits; > /* END INTERESTING PART */ > } > } > } > {code} > Is there any value to having all this other logic in the protected API? It > seems like something thats not useful for a subclass... Maybe this stuff can > become final, and "INTERESTING PART" calls a simpler method, something like: > {code} > protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) { > final FixedBitSet bits = new FixedBitSet(reader.maxDoc()); > bits.or(iterator); > return bits; > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org