rolaren...@earthlink.net wrote:
Hey Mike --
Thanks for prompt & clear reply!
This (the sneaky "difference" between an indexed Document and a the
newly-created-at-search-time Document) is a frequent confusion with
Lucene.
The field needs to be marked as stored (Field.Store.YES) in order for
it to appear in the retrieved document at search time.
But, TokenStream fields cannot be stored since Lucene can't
regenerate
the original string for that field.
OK, so the way I was trying could never work, I guess? No surprise
really that the TokenStream cannot be re-accessed. I just had no
clue what else to try ...
Right.
Since you are storing the term vector, you could retrieve that using
IndexReader.getTermFreqVector.
OK, didn't see that coming, but glad it did -- I have tried that,
and indeed I can get the TermFreqVector for the Field in which I am
interested, and it contains the same sort of data as were once in
the TokenStream, all fine.
Now I notice (from googling) that I can also downcast TermFreqVector
to TermPositionVector, which contains the offsets (which I will need).
So -- under what conditions would that cast fail?
The cast fails if you had indexed the field with Field.TermVector.YES,
which does not store positions nor offsets information. If you always
index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or
WITH_POSITIONS_OFFSETS, the cast will always succeed.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org