Indeed Lucene stores position information and uses that when doing
phrase queries. It is stored separately from term vectors.
However, the positions are "inverted" meaning for a given term you can
find all documents that had that term, as well as the positions where
that term had occurred in the documents. So, because of this
inversion, it's not easily reconstructed into all terms & their
positions that occurred in a document. It is feasible to do so, but
the amount of computation/IO really makes it unrealistic in most
situations. This is why term vectors (they are not inverted) are used
when you want to retrieve all terms/positions/offsets for a single
document.
Mike
PS -- it's better to use java-user mailing list for this sort of
question.
syga wrote:
Dear all,
Am I correct to believe that a quoted (phrase) search, like "red
dog",
returns documents containing the consecutive words "red" and "dog"
in that
order, even without storing the term vector (Field.TermVector.NO)?
If the inverted index (with Field.TermVector.NO and
Field.Store.NO) is
able to check whether the words are consecutive and in the right
order, then
I suppose that the inverted index must somehow contain the positional
information of the words in the documents.
If my supposition is correct, then is it possible to access this
positional information via the Lucene API? Of course, I am not
speaking
about indexReader.getTermFreqVector(doc, field), which returns null
if we
use Field.TermVector.NO.
If my supposition is incorrect, could you please explain how the
inverted
index is able to deal with quoted searches without having this
positional
information?
Thank you so much,
SG.
--
View this message in context:
http://www.nabble.com/Retrieving-term-positions-without-storing-the-term-vectors-tp18359432p18359432.html
Sent from the Lucene - General mailing list archive at Nabble.com.