On 7/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote:
David Balmain wrote on 07/10/2006 01:04 AM:
> The only problem I could find with this solution is that
> fields are no longer in alphabetical order in the term dictionary but
> I couldn't think of a use-case where this is necessary although I'm
> sure there probably is one.

So presumably fields are still contiguous, you keep a pointer to where
each field starts, and terms within the field remain in alphabetical order?

Actually yes, that is how I did it although I'm not sure it's the best
way now. I was hoping that by having a pointer to the start of each
field there would be some good perfomance gains in searching but it
turned out not to be the case. You really only save a couple of
iterations in the getIndexOffset method.

To make things easier though, you can just leave the
TermInfosWriter/Reader almost as they are. The only difference though
is that you store field numbers in the index rather than field names
and when you compare terms while scanning the index, you also compare
field numbers rather than field names.

I don't know if I've described it very well but I hope that makes sense.

Cheers,

Dave

PS. By the way, I don't know if I made this clear but the 5x speed up
I was talking about comes during indexing. The performance improvement
as far as search is concerned wasn't what I had hoped. It is a little
faster but the bottle neck really comes from reading the documents
from the index. So to alleviate that I've added lazy field loading
which seems to work well. Actually, I've set it up so that I can read
excerpts from fields without even loading the whole field so
highlighting is super fast.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to