Result Relevance (was: Handling Duplicates(

Patrick Burrows Sat, 19 May 2007 08:33:55 -0700

Thinking about this more, I don't think doing a second DB lookup for each
result is going to scale well. It is possible that a single search returns
tens of thousands of results, the very last one might be the most relevant.
I am going to have to store the relevancy factors (it is more than just
popularity) within the index itself.


I think I will write something to update the relevancy rating once a week or
so for each indexed document. Afterall, I don't think Google updates their
PageRank more than once a month or so.

After that it is just a matter of sorting by that relevancy rating. Though,
I read on the forums that sorting is a bit of an expensive procedure.
Someone mentioned 100 searches / sec going down to 10 / sec. Not sure the
details or the hardware. But that is an order of magnitude difference, if
those results can be believed.

Gonna experiment, I guess.


On 5/18/07, Michael Garski <[EMAIL PROTECTED]> wrote:

Patrick,

I've had to do something very similar, and you have a couple of options:

1. If the 'popularity' value is stored in a database, you can look up
those values after performing your search against the index and then
sort.

2. Continually update the index to reflect the most recent
'popularity' value and then perform a custom sort during your search.

For my application, #2 is what we fond to be most efficient.

Michael

On May 18, 2007, at 4:48 AM, Patrick Burrows wrote:

> Thanks guys. I'll try it out.
>
> My next question is going to be about ranking the results of my
> searches
> based on information that is not in the index (popularity, for
> instance,
> which might change hourly). Is there some reading I can do on the
> subject
> before I start asking questions?
>
>

--
-
P

Result Relevance (was: Handling Duplicates(

Reply via email to