Hm, removing duplicates (as determined by a value of a specified document
field) from the results would be nice.
How would your addition affect performance, considering it has to check the PQ
for a previous value for every candidate hit?
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . .
The duplicate check would just be on the doc ID. I'm using TreeSet to detect
duplicates with no noticeable affect on performance. The PQ only has to be
checked for a previous value IFF the element about to be inserted is
actually inserted and not dropped because it's less than the least value
On 3/29/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:
Ah, I see. This is less attractive to me personally, but maybe it helps
others. One thing I don't understand is why/how you'd get duplicate documents
with the same doc ID in there. Isn't insert(FieldDoc fdoc) called only once
for each
Yes, my custom query processor can sometimes make 2 Lucene search calls
which may result in duplicate docs being inserted on the same PQ. The
simplest solution is to make lessThan public. I'm curious to know if anyone
else is performing multiple searches under the covers.
Peter
On 3/29/07,
I've got a similar duplicate case, but my duplicates are based on an external ID
rather than Doc id so occurs for a single Query. It's using a custom
HitCollector but score based, not field sorted.
If my duplicate contains a higher score than one on the PQ I need to update the
stored score
Peter, how did you achieve 'last wins' as you must presumably remove first
from the PQ?
I implemented 'first wins' because the score is less important than other
fields (distance, in our case), but you make a good point since score may be
more important. How did you implement remove()?
Peter
Peter Keegan wrote:
I implemented 'first wins' because the score is less important than other
fields (distance, in our case), but you make a good point since score
may be
more important. How did you implement remove()?
I've got my own PriorityQueue
public boolean remove(E o)
{
On 3/29/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:
Hm, removing duplicates (as determined by a value of a specified document
field) from the results would be nice.
How would your addition affect performance, considering it has to check
the PQ for a previous value for every candidate hit?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: Peter Keegan [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Thursday, March 29, 2007 11:00:24 AM
Subject: Re: FieldSortedHitQueue enhancement
The duplicate check would just be on the doc ID