[ 
https://issues.apache.org/jira/browse/LUCENE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220089#comment-14220089
 ] 

Mark Harwood commented on LUCENE-6066:
--------------------------------------

bq. But how will you track the min element for each key in the PQ (to know 
which element to remove, when a more competitive hit with that key arrives)?

I was thinking of this as a foundation: (pseudo code) 

{code:title=DiversifyingPriorityQueue.java|borderStyle=solid}
   abstract class KeyedElement {
       int pqPos;
       abstract Object getKey();
   }
   class DiversifyingPriorityQueue<T extends KeyedElement> extends 
PriorityQueue<T> {
        FastRemovablePriorityQueue<T> mainPQ;
        Map<Object, PriorityQueue> perKeyQueues;
  }
{code}

You can probably guess at the logic but it is based around: 
* making sure each key has a max of n entries using an entry in perKeyQueues.
* Evictions from the mainPQ will require removal from the related perKeyQueue
* Emptied perKeyQueues can be recycled for use with other keys
* Evictions from the perKeyQueue will require removal from the mainPQ

bq. This seems promising, maybe as a separate dedicated (forked) PQ impl?

Yes, introducing a linear-cost remove by marking elements with a position is an 
added cost that not all PQs will require so forking seems necessary. In this 
case a common abstraction for these different PQs would be useful for the 
places where results are consumed e.g. TopDocsCollector


> New "remove" method in PriorityQueue
> ------------------------------------
>
>                 Key: LUCENE-6066
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6066
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-PQRemoveV1.patch
>
>
> It would be useful to be able to remove existing elements from a 
> PriorityQueue. 
> The proposal is that a linear scan is performed to find the element being 
> removed and then the end element in heap[size] is swapped into this position 
> to perform the delete. The method downHeap() is then called to shuffle the 
> replacement element back down the array but the existing downHeap method must 
> be modified to allow picking up an entry from any point in the array rather 
> than always assuming the first element (which is its only current mode of 
> operation).
> A working javascript model of the proposal with animation is available here: 
> http://jsfiddle.net/grcmquf2/22/ 
> In tests the modified version of "downHeap" produces the same results as the 
> existing impl but adds the ability to push down from any point.
> An example use case that requires remove is where a client doesn't want more 
> than N matches for any given key (e.g. no more than 5 products from any one 
> retailer in a marketplace). In these circumstances a document that was 
> previously thought of as competitive has to be removed from the final PQ and 
> replaced with another doc (eg a retailer who already has 5 matches in the PQ 
> receives a 6th match which is better than his previous ones). This particular 
> process is managed by a special "DiversifyingPriorityQueue" which wraps the 
> main PriorityQueue and could be contributed as part of another issue if there 
> is interest in that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to