On Jul 17, 2008, at 1:57 PM, eks dev wrote:
is there any solution to have pure postings lists without
interleaved tf ... this eats a lot of CPU for VInt decoding on dense
terms (also doubles IO...) in our case.
To decompress integers really quickly, we shouldn't even be using
VInts. We should be using PForDelta as described in <http://www2008.org/papers/pdf/p387-zhangA.pdf
>. Encoding postings data with PForDelta is one of those rare
opportunities to speed up searching globally.
It would be reasonably straightforward to integrate PForDelta if we
already had Flexible Indexing implemented. Maybe some damn-the-
spaghetti optimization junkie wants to try grafting it onto Lucene
before that, but it would be a hell of a lot easier to do it afterwards.
I know about flexible indexing, but cannot wait (I guess it will
take some time?).
Mike McCandless and I seem to be the only ones with a high level of
interest in Flexible Indexing. However, I find it difficult to shave
volunteer hours from KinoSearch/Lucy to work on Lucene, and Mike has
lots going on himself.
Does it make sense to start working on it? Can be this somehow later
incorporated into Flexible Indexing...
Maybe. In the same sense that the current payloads implementation can.
FWIW, doing as you suggest in KinoSearch, which has a form of Flexible
Indexing already implemented, would be a piece of cake. It would
involve duping a plugin file, changing 5-10 lines of code, and writing
some tests. Eventually, it should be just as easy to make that kind
of change in Lucene.
Simply asking for help if somebody accidently happens to have some
Quick 'n Dirty solution/idea.
If you never care about anything other than pure term matching, you
could track down all the places where tf is encoded/decoded in Lucene
and just hack stuff out, I guess. I dunno, though, because code that
touches the file format is spread out all over Lucene, so its hard to
know if you've accounted for every last chunk.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]