Proper use of TermsEnum.seek?

Toke Eskildsen Mon, 21 Feb 2011 06:28:09 -0800

My low-memory sorting/faceting-hacking requires terms to be accessed by
ordinals. With Lucene 4.0 I cannot depend on TermsEnums supporting ord()
and seek(long), so the code switches to a cache that keeps track of
every X terms if they are not implemented. When the terms for an ordinal
is requested, it jumps to the nearest previously cached term and calls
next() from there until the ordinal matches. So far so good.


Two methods for seeking terms are:
seek(BytesRef text) and seek(BytesRef term, TermState state).

The JavaDoc indicates that the seek with TermState is (potentially) the
fastest in this scenario as implementations can seek very efficient
using a custom TermState.

My problem is that I am going for low memory and it seems that I need to
keep track of both BytesRef term and TermState state in order to use
this method. This is quite a burden, memory-wise.

I tried calling with an empty BytesRef term. This gave me an empty
result back for the call itself, but the correct terms for subsequent
calls to next. This works perfectly for my scenario. However, that was
just an experiment using the default variable gap codec, so I am unsure
if I can count on this behavior for any given codec?

Any thoughts on how to reduce the memory needed for ordinal-based
lookup, without killing performance, would be appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Proper use of TermsEnum.seek?

Reply via email to