On Tuesday 08 January 2008 22:49:18 Doron Cohen wrote: > This is done by Lucene's scorers. You should however start > in http://lucene.apache.org/java/docs/scoring.html, - scorers > are described in the "Algorithm" section. "Offsets" are used > by Phrase Scorers and by Span Scorer.
That is for the case that offsets were meant to be positions within a document. It is also possible that offsets were meant in the sense of using skipTo(doc) instead of next() on a Scorer. This is done during query search when at least one term is required. Regards, Paul Elschot > > Doron > > On Jan 8, 2008 11:24 PM, Marjan Celikik < [EMAIL PROTECTED]> wrote: > > > Doron Cohen wrote: > > > Hi Marjan, > > > > > > Lucene process the query in what can be called > > > one-doc-at-a-time. > > > > > > For the example query - x y - (not the phrase query "x y") - all > > > documents containing either x or y are considered a match. > > > > > > When processing the query - x y - the posting lists of these two > > > index terms are traversed, and for each document met on the way, > > > a score is computed (taking into account both terms), and "collected". > > > At the end of the traversal, usually best N collected docs are returned > > as > > > search result. So, this is an exhaustive computation creating a union of > > > the two posting. For the query - +x +y - in intersection rather than > > > union is required, and the way Lucene does it is again to traverse > > > the two posting lists, just that only documents seen in both lists > > > are scored and collected. This allows to optimize the search, > > > skipping large chunks of the posting lists, especially when > > > one term is rarer than the other. > > > > > Thank you for your answer. > > > > I am having trouble finding the function which traverses the documents > > such that they get scored. Can you > > please tell me where the posting lists (for a +x +y query) get > > intersected after they get read (by next() I guess) > > from the index? > > > > In particular, I am interested in how does Lucene get the new positions > > (offsets) of the documents seen > > in both posting lists, i.e. positions (in a document) for the query word > > x, and positions for the query word y. > > > > Thank you in advance! > > > > Marjan. > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]