[
https://issues.apache.org/jira/browse/LUCENE-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613274#comment-14613274
]
Michael McCandless commented on LUCENE-6365:
--------------------------------------------
bq. I adapted my patch to the latest changes in trunk.
Thanks.
bq. I think the reuse of the iterator is one core part of this whole patch. I
tried to rework the api of the iterator so that the reuse case and the no-reuse
case are handled in a similar way. I hope you like it now (at least a bit).
Alas I still don't think this is an appropriate place for object reuse.
bq. Lucene does this kind of reuse already, e.g. see Transition.
That's true: Lucene does reuse objects in many low-level places, but this is
ugly and cancerous and dangerous (can easily cause bugs, e.g. accidentally
reusing one iterator across threads) and anti-Java, etc., and it really should
be used only sparingly, and should be the exception not the rule. I don't
think this API qualifies, i.e. it's a bad tradeoff to pollute the API to eeek
out a bit of GC perf gain that in real usage would be negligible because the
cost of building an automaton and the cost of consuming each string that's
iterated would normally dwarf the small GC cost of creating a new iterator per
automaton. APIs are hard enough to "get right" as it is ...
bq. FuzzyCompletionQuery has been added lately and relies on the old big set of
finite strings. I am not sure how to rework it. Currently it still uses the
set, maybe it is better to use the iterator inside of FuzzyCompletionWeight,
but this means recomputing the finite strings over and over again. What do you
think?
It's fine to leave this as the full {{Set<String>}} for now. It's no worse :)
Progress not perfection...
bq. BTW topoSortStates() is implemented by AnalyzingSuggester and
CompletionTokenStream identically. Maybe it should be moved to one place, maybe
to Operations?
Woops, I'll go move that to Operations now, good idea, thank you!
> Optimized iteration of finite strings
> -------------------------------------
>
> Key: LUCENE-6365
> URL: https://issues.apache.org/jira/browse/LUCENE-6365
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/other
> Affects Versions: 5.0
> Reporter: Markus Heiden
> Priority: Minor
> Labels: patch, performance
> Attachments: FiniteStrings_reuse.patch
>
>
> Replaced Operations.getFiniteStrings() by an optimized FiniteStringIterator.
> Benefits:
> Avoid huge hash set of finite strings.
> Avoid massive object/array creation during processing.
> "Downside":
> Iteration order changed, so when iterating with a limit, the result may
> differ slightly. Old: emit current node, if accept / recurse. New: recurse /
> emit current node, if accept.
> The old method Operations.getFiniteStrings() still exists, because it eases
> the tests. It is now implemented by use of the new FiniteStringIterator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]