[
https://issues.apache.org/jira/browse/LUCENE-7686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-7686:
---------------------------------------
Attachment: LUCENE-7686.patch
Another iteration, this time filtering dups earlier in the top N
search. I added a new method, {{acceptPartialPath}} to the
{{Util.TopNSearcher}} class so that subclasses are able to prune a
still in-progress path, not just a completed path. This should be
quite efficient even when the number of duplicates is very high,
because the top N search will quickly push to the one not deleted, not
filtered out, highest scoring document with the suggestion, record
that surface form, and then prune subsequent intermediate paths
sharing that same surface form.
I also added another "extreme" dedup test case to test the logic that
computes the necessary queue size, and it's passing, and the new
{{TestSuggestField.testRandom}} seems to survive moderate beasting...
I think it's ready.
> NRT suggester should have option to filter out duplicates
> ---------------------------------------------------------
>
> Key: LUCENE-7686
> URL: https://issues.apache.org/jira/browse/LUCENE-7686
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-7686.patch, LUCENE-7686.patch, LUCENE-7686.patch
>
>
> Some of the other suggesters have this ability, and it's quite simple to add
> it to the NRT suggester as long as the thing we are filtering on is the
> suggest key itself, not e.g. another stored field from the document.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]