So as Venu pointed out, sorting doesn't seem to help the problem. If we have
to walk the result set, access docs and dedupe using brute force, we're
better off w/ the standard order by relevance.

If you've got an example of this type of clustering done in a more efficient
way, that'd be great.

Any other ideas?


----- Original Message ----- 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Saturday, April 10, 2004 12:35 AM
Subject: Re: clustering results


> On Apr 9, 2004, at 8:16 PM, Michael A. Schoen wrote:
> > I have an index of urls, and need to display the top 10 results for a
> > given query, but want to display only 1 result per domain. It seems
> > that using either Hits or a HitCollector, I'll need to access the doc,
> > grab the domain field (I'll have it parse ahead of time) and only
> > take/display documents that are unique.
> >
> > A significant percentage of the time I expect I may have to access
> > thousands of results before I find 10 in unique domains. Is there a
> > faster approach that won't require accessing thousands of documents?
>
> I have examples of this that I can post when I have more time, but a
> quick pointer... check out the overloaded IndexSearcher.search()
> methods which accept a Sort.  You can do really really interesting
> slicing and dicing, I think, using it.  Try this one on for size:
>
>      example.displayHits(allBooks,
>          new Sort(new SortField[]{
>            new SortField("category"),
>            SortField.FIELD_SCORE,
>            new SortField("pubmonth", SortField.INT, true)
>          }));
>
> Be clever indexing the piece you want to group on - I think you may
> find this the solution you're looking for.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to