[ 
https://issues.apache.org/jira/browse/LUCENE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239155#comment-14239155
 ] 

Michael McCandless commented on LUCENE-6066:
--------------------------------------------

This looks great!

Maybe advertise in the javadocs the runtime cost of the collector? (The linear 
remove method).

In TestPriorityQueue, instead of {{int numDocsInPQ = (int) (1 + 
(random.nextFloat() * 100));}} you could use {{TestUtil.nextInt(random(), 1, 
100)}}.

Why couldn't you just pass your custom queue instead of null to super() in 
DiversifiedTopDocsCollector ctor?  Scary to have to pass null to super ... 
something is wrong.

Can you fix diversifying vs diversified naming (pick one)?

Maybe add a random test, in addition to the fixed scores/MAX_HITS_PER_KEY?  The 
random test could do the "slow and stupid but probably correct" diversified 
collection to get truth, and assert it equals the result using the new 
collector.

The javadocs are nice and clear (diversify by "band"), but then the abstract 
method returns NumericDocValues, which is confusing: how does "beatles" become 
a number?  Why not e.g. SortedDVs?  Maybe, the test case can show how the nice 
Beatles example would actually work?

I saw a test about paging; how does/should paging work with such a collector?

I think the changes to PQ are OK.  Scary to add a linear-cost method, but it 
advertises this.  If this is too-scary to anyone we could always fully fork to 
a private (for this collector) impl.


> New "remove" method in PriorityQueue
> ------------------------------------
>
>                 Key: LUCENE-6066
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6066
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-PQRemoveV3.patch
>
>
> It would be useful to be able to remove existing elements from a 
> PriorityQueue. 
> The proposal is that a linear scan is performed to find the element being 
> removed and then the end element in heap[size] is swapped into this position 
> to perform the delete. The method downHeap() is then called to shuffle the 
> replacement element back down the array but the existing downHeap method must 
> be modified to allow picking up an entry from any point in the array rather 
> than always assuming the first element (which is its only current mode of 
> operation).
> A working javascript model of the proposal with animation is available here: 
> http://jsfiddle.net/grcmquf2/22/ 
> In tests the modified version of "downHeap" produces the same results as the 
> existing impl but adds the ability to push down from any point.
> An example use case that requires remove is where a client doesn't want more 
> than N matches for any given key (e.g. no more than 5 products from any one 
> retailer in a marketplace). In these circumstances a document that was 
> previously thought of as competitive has to be removed from the final PQ and 
> replaced with another doc (eg a retailer who already has 5 matches in the PQ 
> receives a 6th match which is better than his previous ones). This particular 
> process is managed by a special "DiversifyingPriorityQueue" which wraps the 
> main PriorityQueue and could be contributed as part of another issue if there 
> is interest in that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to