If this is really about adjusting score based on field length (didn't follow the thread closely), then this sounds like a job for a custom Similarity with a custom implementation of lengthNorm method.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Anshum <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Saturday, April 26, 2008 12:32:56 AM > Subject: Re: boosting relevance of certain documents > > Hi Daniel, > > Just a suggestion, how bout storing an extra field while indexing that has > the "length" of the document. You could just divide the score of the > document (change the lucene code) with the length of the document (or > something on the same lines) while calculating the score. In this manner, > among 2 docs, the smaller doc with the same score would get preference. > Do you think that would somehow solve your problem? > Though again, it involves changing the algo but this would be useful in case > you have documents that keep on getting updated and you can not afford to > hard-code the doc preference. > > -- > Anshum Gupta > > > On Sat, Apr 26, 2008 at 4:12 AM, Grant Ingersoll > wrote: > > > It really depends. Hand tuning scoring algs for a specific query is very > > prone to local maxima problems. In other words, you fix one query and break > > 50 others. Sometimes, a good old "configurable" hard code is the way to go. > > If you want a specific doc to be #1, make it number one. You will pull > > your hair out otherwise. In Solr, this is handled via the Query Elevation > > Component, but isn't all that difficult to implement. > > > > Likewise, if you have a priori knowledge that a particular document is > > more important, then give it a relatively large boost during indexing, being > > aware that Lucene does not offer much granularity in terms of boosts. In > > other words, boost it something like 5 or 10 times, instead of 1.1 vs. 1.2. > > > > On the other hand, if you are truly seeing broad problems, then you need > > to build up a set of queries and judgments (ala TREC) or the > > contrib/benchmark Quality packages. You might also look at Lucene's > > Similarity class. Lucene's length normalization is often less than optimal > > for certain types of documents (see the IBM Haifa's assessment for the > > "Million Query" track of TREC on the Lucene Wiki). > > > > Cheers, > > Grant > > > > > > On Apr 25, 2008, at 3:50 PM, Daniel Freudenberger wrote: > > > > Thanks for your response. I already knew that the relevance is based on > > > the > > > term frequency but in some cases it's just not what the user expects. > > > As I already mentioned, "fifa 2003 fifa 03" vs. "fifa 08" is such a case > > > - > > > searching for "fifa" would return the "fifa 2003 fifa 03" document first > > > but > > > the "fifa 08" document is more important (from the user's point of > > > view). > > > > > > Any suggestions? > > > > > > Best regards, > > > Daniel > > > -----Original Message----- > > > From: Jonathan Ariel [mailto:[EMAIL PROTECTED] > > > Sent: Friday, April 25, 2008 8:11 PM > > > To: java-user@lucene.apache.org > > > Subject: Re: boosting relevance of certain documents > > > > > > Ok. So I'm not an expert of the scoring algorithm, but based on tf*idf > > > you > > > can tell that the returned document is more relevant because it has more > > > term frequency. > > > > > > Using the explain you can see the following: > > > > > > Doc 1 > > > 0.643841 = (MATCH) fieldWeight(searchable:fifa in 0), product of: > > > 1.0 = tf(termFreq(searchable:fifa)=1) > > > 1.287682 = idf(docFreq=2) > > > 0.5 = fieldNorm(field=searchable, doc=0) > > > > > > Doc2 > > > 0.68289655 = (MATCH) fieldWeight(searchable:fifa in 1), product of: > > > 1.4142135 = tf(termFreq(searchable:fifa)=2) > > > 1.287682 = idf(docFreq=2) > > > 0.375 = fieldNorm(field=searchable, doc=1) > > > > > > On Fri, Apr 25, 2008 at 2:30 PM, Daniel Freudenberger < > > > [EMAIL PROTECTED]> wrote: > > > > > > I'm using the StandardAnalyzer - hope this answers your question (I'm > > > > quite > > > > new to the lucene thing) > > > > > > > > -----Original Message----- > > > > From: Jonathan Ariel [mailto:[EMAIL PROTECTED] > > > > Sent: Friday, April 25, 2008 6:59 PM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: boosting relevance of certain documents > > > > > > > > How are you analyzing the searchable field? > > > > > > > > On Fri, Apr 25, 2008 at 12:49 PM, Daniel Freudenberger < > > > > [EMAIL PROTECTED]> wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > I'm using lucene within a new project and I'm not sure about how to > > > > > > > > > solve > > > > > > > > > the following problem: My index consists of the two attributes "id" > > > > > and > > > > > "searchable". "id" is the id of a product and "searchable" is a > > > > > combination > > > > > of the product name and its category name. > > > > > > > > > > > > > > > > > > > > example: > > > > > > > > > > id searchable > > > > > > > > > > 1 fifa 08 - playstation 3 > > > > > > > > > > 2 fifa 2003 fifa 03 - playstation 3 > > > > > > > > > > 3 playstation 60gb hdd - playstation 3 > > > > > > > > > > 4 playstation i like you - playstation 3 > > > > > > > > > > > > > > > > > > > > When searching for "fifa", lucene returns the product with id 2 at > > > > > > > > > first, > > > > > > > > > whereas id 1 ("fifa 08") would be the much more relevant result > > > > > (from > > > > > > > > > the > > > > > > > > > user side of view). the same problem arises when searching for > > > > > "playstation" > > > > > - the customer expects products having "playstation" in their names > > > > > at > > > > > first, ideally the console itself. in reality however, he gets all > > > > > possible > > > > > products which are in the "playstation" category as well. > > > > > > > > > > > > > > > > > > > > my idea was to introduce another attribute relevance, which may > > > > > increase > > > > > the > > > > > relevance of an entry. the actual relevance shouldn't be suppressed > > > > > completely though, but should only be taken into account with > > > > > products > > > > > that > > > > > are similarly relevant for a specific search term. > > > > > > > > > > > > > > > > > > > > Does anybody have an idea on how to solve this problem? > > > > > > > > > > > > > > > > > > > > Thank you in advance, > > > > > > > > > > Daniel > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]