> From: Winton Davies [mailto:[EMAIL PROTECTED]]
> 
>   Not really, all documents have an accountID, but I need to search 
> all the documents
> first, and each document that is returned has an accountID, but I 
> just want one document
> per accountID.
> 
> so:
> 
>   doc1 acc1
>   doc2 acc1
>   doc3 acc1
>   doc4 acc2
>   doc5 acc2
>   doc6 acc2
> 
> Lets say the query "X" returns hits in this order:
> 
>   doc1
>   doc2
>   doc3
>   doc4
>   doc5
> 
> what I want returned is:
> 
>   doc1  (best of acc1)
>   doc4  (best of acc2)

You might try something like:
  construct an int[] array that has an "account number" for each document
  construct a BitSet to keep track of whether you've seen a hit for each
account
  use a HitCollector that uses these as follows:
    if (!seen.get(accounts[doc])) {
      seen.set(accounts[doc]);
      collect(doc);
    }
That should be quite fast.

You'll want to construct the accounts array once when you open the index,
perhaps by enumerating some terms that store this information.  You'll need
to construct a new "seen" BitSet for each query, or clear a cached one.

Doug

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to