Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Otis Gospodnetic
Hm, removing duplicates (as determined by a value of a specified document field) from the results would be nice. How would your addition affect performance, considering it has to check the PQ for a previous value for every candidate hit? Otis . . . . . . . . . . . . . . . . . . . . . . . . . .

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
The duplicate check would just be on the doc ID. I'm using TreeSet to detect duplicates with no noticeable affect on performance. The PQ only has to be checked for a previous value IFF the element about to be inserted is actually inserted and not dropped because it's less than the least value

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Yonik Seeley
On 3/29/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Ah, I see. This is less attractive to me personally, but maybe it helps others. One thing I don't understand is why/how you'd get duplicate documents with the same doc ID in there. Isn't insert(FieldDoc fdoc) called only once for each

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
Yes, my custom query processor can sometimes make 2 Lucene search calls which may result in duplicate docs being inserted on the same PQ. The simplest solution is to make lessThan public. I'm curious to know if anyone else is performing multiple searches under the covers. Peter On 3/29/07,

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Antony Bowesman
I've got a similar duplicate case, but my duplicates are based on an external ID rather than Doc id so occurs for a single Query. It's using a custom HitCollector but score based, not field sorted. If my duplicate contains a higher score than one on the PQ I need to update the stored score

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Peter Keegan
Peter, how did you achieve 'last wins' as you must presumably remove first from the PQ? I implemented 'first wins' because the score is less important than other fields (distance, in our case), but you make a good point since score may be more important. How did you implement remove()? Peter

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Antony Bowesman
Peter Keegan wrote: I implemented 'first wins' because the score is less important than other fields (distance, in our case), but you make a good point since score may be more important. How did you implement remove()? I've got my own PriorityQueue public boolean remove(E o) {

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Tom Hill
On 3/29/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hm, removing duplicates (as determined by a value of a specified document field) from the results would be nice. How would your addition affect performance, considering it has to check the PQ for a previous value for every candidate hit?

Re: FieldSortedHitQueue enhancement

2007-03-29 Thread Tom Hill
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Peter Keegan [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Thursday, March 29, 2007 11:00:24 AM Subject: Re: FieldSortedHitQueue enhancement The duplicate check would just be on the doc ID