Re: Indexing of virtual "made up" documents

Morus Walter Tue, 26 Apr 2005 23:07:13 -0700

Erik Hatcher writes:
> >
> > There are some information retrieval settings which tend to say that 
> > things that appear early in the document should be considered with 
> > greater score... is there nothing such in Lucene's scoring ?
> 
> No, Lucene doesn't have that feature, at least not explicitly....  it 
> could be hacked, sort of, by injecting multiple of the same term in the 
> same position (to get a higher term frequency) for the earlier terms.  
> Back to the original question - the position information will not 
> adversely affect scoring.
> 
Wouldn't it be easier to fake that by using a proximity query and a document
start marker?
E.g. index `xxxstartxxx some text other text'
and search for "xxxstartxxx some"~10000000 or "xxxstartxxx other"~10000000
If I understand proximity query correctly the latter should have a lower
score (given that 'some' and 'other' have equal scores). Untested though.


Alternatively it should be able to write a query that does such a scoring
directly (without the document start anchor) by the same means proximity
query uses. Proximity query uses positional information so it should be 
possible to use that information for scoring based on document position also.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Indexing of virtual "made up" documents

Reply via email to