Re: Retrieving term positions without storing the term vectors

Michael McCandless Wed, 09 Jul 2008 07:43:22 -0700

Indeed Lucene stores position information and uses that when doingphrase queries. It is stored separately from term vectors.

However, the positions are "inverted" meaning for a given term you canfind all documents that had that term, as well as the positions wherethat term had occurred in the documents. So, because of thisinversion, it's not easily reconstructed into all terms & theirpositions that occurred in a document. It is feasible to do so, butthe amount of computation/IO really makes it unrealistic in mostsituations. This is why term vectors (they are not inverted) are usedwhen you want to retrieve all terms/positions/offsets for a singledocument.


Mike

PS -- it's better to use java-user mailing list for this sort ofquestion.


syga wrote:

Dear all,
Am I correct to believe that a quoted (phrase) search, like "reddog",returns documents containing the consecutive words "red" and "dog"in that
order, even without storing the term vector (Field.TermVector.NO)?
If the inverted index (with Field.TermVector.NO andField.Store.NO) isable to check whether the words are consecutive and in the rightorder, then
I suppose that the inverted index must somehow contain the positional
information of the words in the documents.

  If my supposition is correct, then is it possible to access this
positional information via the Lucene API? Of course, I am notspeakingabout indexReader.getTermFreqVector(doc, field), which returns nullif we
use Field.TermVector.NO.
If my supposition is incorrect, could you please explain how theinvertedindex is able to deal with quoted searches without having thispositional
information?

  Thank you so much,
SG.
--
View this message in context: 
http://www.nabble.com/Retrieving-term-positions-without-storing-the-term-vectors-tp18359432p18359432.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Retrieving term positions without storing the term vectors

Reply via email to