rolaren...@earthlink.net wrote:

Hey Mike --

Thanks for prompt & clear reply!

This (the sneaky "difference" between an indexed Document and a the
newly-created-at-search-time Document) is a frequent confusion with
Lucene.

The field needs to be marked as stored (Field.Store.YES) in order for
it to appear in the retrieved document at search time.

But, TokenStream fields cannot be stored since Lucene can't regenerate
the original string for that field.

OK, so the way I was trying could never work, I guess? No surprise really that the TokenStream cannot be re-accessed. I just had no clue what else to try ...

Right.

Since you are storing the term vector, you could retrieve that using
IndexReader.getTermFreqVector.

OK, didn't see that coming, but glad it did -- I have tried that, and indeed I can get the TermFreqVector for the Field in which I am interested, and it contains the same sort of data as were once in the TokenStream, all fine.

Now I notice (from googling) that I can also downcast TermFreqVector to TermPositionVector, which contains the offsets (which I will need).

So -- under what conditions would that cast fail?

The cast fails if you had indexed the field with Field.TermVector.YES, which does not store positions nor offsets information. If you always index the field with TermVector.WITH_OFFSET, WITH_POSITIONS or WITH_POSITIONS_OFFSETS, the cast will always succeed.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to