Re: Using Lucene for searching tokens, not storing them.

karl wettin Wed, 19 Apr 2006 22:28:15 -0700


18 apr 2006 kl. 22.08 skrev karl wettin:

After adding a couple of binary searches in well needed places (anda couple of new bugs that in a few cases affects the results) I'mnow down at 1/8th of the time compared to RAMDirectory. That isreally fast if you ask me.

After fixing the bugs, it's now 4.5 -> 5 times the speed. This istrue for both at index and query time. Sorry if I got your hopes uptoo much. There are still things to be done though. Might not havetime to do anything with this until next month, so here is the codeif anyone wants a peek.

Not good enough for Jira yet, but if someone wants to fool aroundwith it, here it is. The implementation passes a TermEnum -> TermDocs-> Fields -> TermVector comparation against the same data in aDirectory.

When it comes to features, offsets don't exists and positions arestored ugly and has bugs.

You might notice that norms are float[] and not byte[]. That is mewho refactored it to see if it would do any good. Bit shifting don'ttake many ticks, so I might just revert that.


I belive the code is quite self explaining.

InstanciatedIndex ii = ..
ii.new InstanciatedIndexReader();
ii.addDocument(s).. replace IndexWriter for now.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene for searching tokens, not storing them.

Reply via email to