Lucene has got some new compressed DocIdSet implementations that are technically very interesting and exciting: PForDeltaDocIdSet, WAH8DocIdSet, EliasFanoDocIdSet, … any more? Yet it's difficult (at least for me) to understand their pros/cons to know when to pick amongst them. They all seem great yet why do we have 3? Only one is actually used by Lucene itself — WAH8DocIdSet in CachingWrapperFilter. Javadocs are hit & miss; the JIRA issues have lots of fascinating background but it's time consuming to distill. I think it would be very useful to summarily document key characteristics on class level javadocs — not so much implementation details but information to help a user choose it versus another. And as a bonus a table perhaps showing relative performance characteristics in package-level javadocs.
Related to this is, I'm wondering does it make sense for a codec's postings (assuming no doc freq & no positions?) to be implemented as a serialized version of one of these compressed doc id sets? I think it would be really great, not just for compression but also because it might support Terms.advance() since some of these compressed formats have indexes. ~ David
