I read all the relevant references I could find in the Users (not Developers) list, and I still don't exactly know what to do.
Let me explain a bit more. The documents I index are all news stories. The typical document body ranges in size from 200 to 2000 words. The document is structured into a couple of dozen indexed fields, but nearly all searching is done in two: the headline and the body. What I'd like to do is get a relevancy-based order in which (a) longer documents tend to get more weight than shorter ones, (b) a document body with 'X' instances of a query term gets a higher ranking than one with fewer than 'X' instances. and (c) a term found in the headline (usually in addition to finding the same term in the body) is more highly ranked than one with the term only in the body. But that's not what happens with the default scoring, and I'd like to change that. I'm guessing, but maybe if I check the document length at indexing time and boost longer documents, that will help. Maybe I could also (at index time) give an extra boost to the headline field. Would that be the most I could do without changing the Lucene core source? Regards, Terry PS: I'm also wondering if the fact that I have so many other fields, this may affect the ranking in a way that diminishes the relevance of the headline and/or body fields? PSS: I'd just like to clarify another point. Much of the background information on the scoring algorithms is beyond me and I have no interest whatsoever in pushing the boundaries of this part of the technology. All I want to do is use it so it comes out in a way that seems reasonable (without having to become an expert in the complex theory behind this). ----- Original Message ----- From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Saturday, January 25, 2003 2:09 AM Subject: Re: Computing Relevancy Differently > Check the lucene-user archives, search for subject "custom scoring api > questions" > I think that may give you the answer.... > > Otis > > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > How would one go about altering the formula for relevancy? (That is, > > which modules and which code?) I'm certain that the current > > algorithm is well founded in logic and probably works well in many > > environments. > > > > However, I find that, as I index news stories, the current algorithm > > frequently doesn't produce meaningful rankings. In previous > > discussions in this list about relevancy, the algorithm seemed to be > > very complex, possibly too complex for my poor brain to fully grasp. > > But I'd like to try some other options and see if they result in > > rankings more in line with what my average viewer would expect. > > > > Regards, > > > > Terry > > > > > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Mail Plus - Powerful. Affordable. Sign up now. > http://mailplus.yahoo.com > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
