On Jul 17, 2008, at 1:57 PM, eks dev wrote:

is there any solution to have pure postings lists without interleaved tf ... this eats a lot of CPU for VInt decoding on dense terms (also doubles IO...) in our case.

To decompress integers really quickly, we shouldn't even be using VInts. We should be using PForDelta as described in <http://www2008.org/papers/pdf/p387-zhangA.pdf >. Encoding postings data with PForDelta is one of those rare opportunities to speed up searching globally.

It would be reasonably straightforward to integrate PForDelta if we already had Flexible Indexing implemented. Maybe some damn-the- spaghetti optimization junkie wants to try grafting it onto Lucene before that, but it would be a hell of a lot easier to do it afterwards.

I know about flexible indexing, but cannot wait (I guess it will take some time?).

Mike McCandless and I seem to be the only ones with a high level of interest in Flexible Indexing. However, I find it difficult to shave volunteer hours from KinoSearch/Lucy to work on Lucene, and Mike has lots going on himself.

Does it make sense to start working on it? Can be this somehow later incorporated into Flexible Indexing...

Maybe.  In the same sense that the current payloads implementation can.

FWIW, doing as you suggest in KinoSearch, which has a form of Flexible Indexing already implemented, would be a piece of cake. It would involve duping a plugin file, changing 5-10 lines of code, and writing some tests. Eventually, it should be just as easy to make that kind of change in Lucene.

Simply asking for help if somebody accidently happens to have some Quick 'n Dirty solution/idea.

If you never care about anything other than pure term matching, you could track down all the places where tf is encoded/decoded in Lucene and just hack stuff out, I guess. I dunno, though, because code that touches the file format is spread out all over Lucene, so its hard to know if you've accounted for every last chunk.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to