RE: Result Relevance (was: Handling Duplicates(

Digy Sat, 19 May 2007 15:32:29 -0700

Hi Patrick,

I also think that doing a db query for each result can degrade the
performance dramatically. Therefore storing relevance factor within the
index is a better idea. But then ,as you say, cost of sorting arises. To
minimize the cost, the number of hits to return can be limited to a
number(nDocs param of Search method of IndexSearcher). But this time, the
ranking algorithm of lucene may skip out more relevant documents before
sorting.

So, I think 
        1- making a search without a "nDoc" limitation
        2- Passing on the result set once and collecting the most relevant N
results(say 100 or 1000)
        3- Then sorting this results
can be better solution.

DIGY

-----Original Message-----
From: Patrick Burrows [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 19, 2007 6:34 PM
To: [email protected]
Subject: Result Relevance (was: Handling Duplicates(

Thinking about this more, I don't think doing a second DB lookup for each
result is going to scale well. It is possible that a single search returns
tens of thousands of results, the very last one might be the most relevant.
I am going to have to store the relevancy factors (it is more than just
popularity) within the index itself.

I think I will write something to update the relevancy rating once a week or
so for each indexed document. Afterall, I don't think Google updates their
PageRank more than once a month or so.

After that it is just a matter of sorting by that relevancy rating. Though,
I read on the forums that sorting is a bit of an expensive procedure.
Someone mentioned 100 searches / sec going down to 10 / sec. Not sure the
details or the hardware. But that is an order of magnitude difference, if
those results can be believed.

Gonna experiment, I guess.

On 5/18/07, Michael Garski <[EMAIL PROTECTED]> wrote:
>
> Patrick,
>
> I've had to do something very similar, and you have a couple of options:
>
> 1. If the 'popularity' value is stored in a database, you can look up
> those values after performing your search against the index and then
> sort.
>
> 2. Continually update the index to reflect the most recent
> 'popularity' value and then perform a custom sort during your search.
>
> For my application, #2 is what we fond to be most efficient.
>
> Michael
>
>
> On May 18, 2007, at 4:48 AM, Patrick Burrows wrote:
>
> > Thanks guys. I'll try it out.
> >
> > My next question is going to be about ranking the results of my
> > searches
> > based on information that is not in the index (popularity, for
> > instance,
> > which might change hourly). Is there some reading I can do on the
> > subject
> > before I start asking questions?
> >
> >
>
> --
> -
> P

RE: Result Relevance (was: Handling Duplicates(

Reply via email to