On Wednesday 18 May 2005 23:06, Yonik Seeley wrote: > > > > contains(docid) and exists(docid) cannot be efficiently implemented > > > > on a VInt based compact filter, so I'd prefer to leave these out. > > > > > > exists() on a BitSet is much faster than next() though... > > > > Yes, but the point is to iterate to the next document based in information > > from RAM and to be able to skipTo() on the index instead of reading it > > sequentially. > > Well, yes, that's one point to filters (and probably the main use). > Another that we are using is to enable fast intersection of two > filters you already have in memory. > > > > > I use a power-of-two hash table with a load factor of .75. So putting > > > 500 docs in my hashset would take up 1024 slots at 4 bytes per slot > > > (4k). > > > > So about 8 bytes per doc. A SortedVIntList normally has 1 byte per doc, > > and never more than 4 bytes per doc (as long as doc numbers are int). > > It depends on your platform and what tradeoffs you want too... our > production boxes all have 16G RAM. > > > > Hashes also have big speed advantage over BitSets for iterating over > > > all docs or taking intersection sizes. The hash also has fast random > > > access that docNrSkipper doesn't have. > > > > Can it take determine the intersection size faster than iterating over > > both sets with an intersecting merge? > > In many cases, yes. > Consider the example of taking the intersection since of a small set > with a large. You need fast random access (or a fast skip) on the > large set.
Well, a weak spot in the SortVIntList is that it does not have a fast skip. This forces the skipTo method in the posted filtered query to use a linear search. More inspiration from the Lucene index data structures could be used to build in the forwarding information to allow a faster skipTo on the compact filter itself. The linear search is here: http://issues.apache.org/bugzilla/show_bug.cgi?id=32921 in the nextDocNr(docNr) method of the DocNrSkipper returned from SortedVIntList.getDocNrSkipper() . Regards, Paul Elschot. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]