[
https://issues.apache.org/jira/browse/LUCENE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220089#comment-14220089
]
Mark Harwood commented on LUCENE-6066:
--------------------------------------
bq. But how will you track the min element for each key in the PQ (to know
which element to remove, when a more competitive hit with that key arrives)?
I was thinking of this as a foundation: (pseudo code)
{code:title=DiversifyingPriorityQueue.java|borderStyle=solid}
abstract class KeyedElement {
int pqPos;
abstract Object getKey();
}
class DiversifyingPriorityQueue<T extends KeyedElement> extends
PriorityQueue<T> {
FastRemovablePriorityQueue<T> mainPQ;
Map<Object, PriorityQueue> perKeyQueues;
}
{code}
You can probably guess at the logic but it is based around:
* making sure each key has a max of n entries using an entry in perKeyQueues.
* Evictions from the mainPQ will require removal from the related perKeyQueue
* Emptied perKeyQueues can be recycled for use with other keys
* Evictions from the perKeyQueue will require removal from the mainPQ
bq. This seems promising, maybe as a separate dedicated (forked) PQ impl?
Yes, introducing a linear-cost remove by marking elements with a position is an
added cost that not all PQs will require so forking seems necessary. In this
case a common abstraction for these different PQs would be useful for the
places where results are consumed e.g. TopDocsCollector
> New "remove" method in PriorityQueue
> ------------------------------------
>
> Key: LUCENE-6066
> URL: https://issues.apache.org/jira/browse/LUCENE-6066
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/query/scoring
> Reporter: Mark Harwood
> Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-PQRemoveV1.patch
>
>
> It would be useful to be able to remove existing elements from a
> PriorityQueue.
> The proposal is that a linear scan is performed to find the element being
> removed and then the end element in heap[size] is swapped into this position
> to perform the delete. The method downHeap() is then called to shuffle the
> replacement element back down the array but the existing downHeap method must
> be modified to allow picking up an entry from any point in the array rather
> than always assuming the first element (which is its only current mode of
> operation).
> A working javascript model of the proposal with animation is available here:
> http://jsfiddle.net/grcmquf2/22/
> In tests the modified version of "downHeap" produces the same results as the
> existing impl but adds the ability to push down from any point.
> An example use case that requires remove is where a client doesn't want more
> than N matches for any given key (e.g. no more than 5 products from any one
> retailer in a marketplace). In these circumstances a document that was
> previously thought of as competitive has to be removed from the final PQ and
> replaced with another doc (eg a retailer who already has 5 matches in the PQ
> receives a 6th match which is better than his previous ones). This particular
> process is managed by a special "DiversifyingPriorityQueue" which wraps the
> main PriorityQueue and could be contributed as part of another issue if there
> is interest in that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]