On Wednesday 18 May 2005 23:06, Yonik Seeley wrote:
> > > > contains(docid) and exists(docid) cannot be efficiently implemented
> > > > on a VInt based compact filter, so I'd prefer to leave these out.
> > >
> > > exists() on a BitSet is much faster than next() though...
> > 
> > Yes, but the point is to iterate to the next document based in information
> > from RAM and to be able to skipTo() on the index instead of reading it
> > sequentially.
> 
> Well, yes, that's one point to filters (and probably the main use). 
> Another that we are using is to enable fast intersection of two
> filters you already have in memory.
> 
> 
> > > I use a power-of-two hash table with a load factor of .75.  So putting
> > > 500 docs in my hashset would take up 1024 slots at 4 bytes per slot
> > > (4k).
> > 
> > So about 8 bytes per doc. A SortedVIntList normally has 1 byte per doc,
> > and never more than 4 bytes per doc (as long as doc numbers are int).
> 
> It depends on your platform and what tradeoffs you want too... our
> production boxes all have 16G RAM.
> 
> > > Hashes also have big speed advantage over BitSets for iterating over
> > > all docs or taking intersection sizes.  The hash also has fast random
> > > access that docNrSkipper doesn't have.
> > 
> > Can it take determine the intersection size faster than iterating over
> > both sets with an intersecting merge?
> 
> In many cases, yes.
> Consider the example of taking the intersection since of a small set
> with a large.  You need fast random access (or a fast skip) on the
> large set.

Well, a weak spot in the SortVIntList is that it does not have a fast skip.
This forces the skipTo method in the posted filtered query to use
a linear search.
More inspiration from the Lucene index data structures could be used
to build in the forwarding information to allow a faster skipTo
on the compact filter itself. The linear search is here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=32921
in the nextDocNr(docNr) method of the DocNrSkipper returned
from SortedVIntList.getDocNrSkipper() .

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to