On Mon, 2005-03-14 at 10:24 -0800, David Spencer wrote: > Yes, in theory the "similarity" package in the sandbox can help. > The code generates a query for a source document to find documents that > are similar to it - the MoreLikeThis class uses the heuristic that 2 > docs are similar if they share "interesting" words. "Interesting" words > are words that are common in a source doc but not too common in the > corpus. If you were do do this you'd do something like this: > > [1] Do your normal query > [2] As you loop thru the results, for every doc > [2a] generate a similarity query > [2b] requery the index for similar docs > [2c] then, maybe, for every doc from [2b] with a score above some > threshold, it it's also high up in the results from [2] then "hide" the > doc a la google et. al. > > Could be tricky coding. Another way is to only show 1 doc from any given > domain. Note that instead of 1 query you'll have "1+n" queries for the > display of "n" search results.
That sounds like an interesting approach. But I'll probably wait until Chuck's patch is included. I'm also a bit worried about the performance of this approach. It might add too much time to each query. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]