Re: why would a Field vanish from a Document?

Michael McCandless Sun, 25 Jan 2009 09:33:40 -0800


rolaren...@earthlink.net wrote:

Now I notice (from googling) that I can also downcast TermFreqVector
to TermPositionVector, which contains the offsets (which I willneed).
So -- under what conditions would that cast fail?
The cast fails if you had indexed the field withField.TermVector.YES,which does not store positions nor offsets information. If youalways
index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or
WITH_POSITIONS_OFFSETS, the cast will always succeed.
OK, cool.
I see in the javadocs for TermPositionVector that it "notnecessarily contains both positions and offsets, but at least one ofthese arrays exists"; does it work like this, I think:
TermVector.WITH_OFFSETS => TermVectorOffsetInfo[] always exists (sofar, works for me)
TermVector.WITH_POSITIONS => positions int[] always exists
TermVector.WITH_POSITIONS_OFFSETS => both arrays always exist


Right.

Right? And I guess the reason for using TermVector.WITH_POSITIONS =>positions int[] is that it has a smaller memory footprint?

Well, first: it's storing something different. Position is (bydefault) the term count, ie first term is position 0, next is position1, etc. Whereas start/end offset are normally the character locationswhere each term started and ended. These are computed during analysisand stored into the index.

Storing only positions gives a smaller index size than only offsets orpositions plus offsets.

The memory difference is typically a non-issue since an app normallydoesn't store these instances around for a long time. Ie normally youpull them from the index, do something interesting, and let them go,during a search request.


Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: why would a Field *vanish* from a Document?

Reply via email to

Re: why would a Field vanish from a Document?