[
https://issues.apache.org/jira/browse/LUCENE-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352779#comment-16352779
]
David Smiley commented on LUCENE-8158:
--------------------------------------
This looks interesting Alan. I looked at both patches. Lets consider the more
invasive one. I like that UnifiedHighlighter is shorter as it was getting
quite long, and this looks like a reasonable way to break it up.
* These methods were on UnifiedHighlighter (and not other internal things)
deliberately: requiresRewrite, preSpanQueryRewrite, preMultiTermQueryRewrite,
shouldHandleMultiTermQuery, shouldHighlightPhrasesStrictly,
shouldPreferPassageRelevancyOverSpeed. By having them on the UH, we can get a
method reference and use them internally where needed without exposing the
internals to the user. PhraseHelper is an internal thing, and I believe the
OffsetProducer abstraction you are introducing here is too. That said, we
could have fewer such methods:
** I'd prefer that we do away with those "should" methods that merely examine
defaultFieldFlags because by creating a second method to customize the settings
(separate from getFlags(field) it's both extra/unnecessary plus if you override
one of those should methods it may in fact do nothing if you have overridden
getFlags as well. So it's error-prone. I'm okay with these disappearing in
7.x; users merely need to switch to overriding {{getFlags(String field)}}.
** I think preSpanQueryRewrite and preMultiTermQueryRewrite could be combined
in purpose to simply be customRewrite or perhaps worded as customizeQuery
avoiding the overloaded word "rewrite". Since such hooks are advanced, I'm
okay with the API changing in 7.x.
* OffsetProducer.getFieldOffsets(field,text) is unused and has no javadocs; do
you want to add it to the visibility test? Also, it's implementation appears
to duplicate that of getFieldOffsetStrategy
BTW I've been thinking of a refactoring similar to your proposal here. It
would be nice if there was a class that held the Query and the things we
extract from it: position-insensitive terms (or all terms), MTQ automata,
position-sensitive queries. It's job would be to extract all this information
from the Query, and that's it. Today it has these aforementioned things but it
would be a nice option to extract the term->boost as well so that we can
customize the relevancy based on boosts in the query (like the original
Highlighter & FVH can). Eventually it would be nice to "visit" the query tree
once directly in this proposed class instead of several times
(weight.extractTerms, MultiTermHighlighting, PhraseHelper). That would be the
extent of the purpose of this thingamajig -- perhaps called QueryExtractor. An
instance of this class would go on each FieldOffsetStrategy and would obviate
the need to pass so many arguments to these classes and for them to have as
many fields as they do (not that there's _that_ much but still).
OffsetProducer here slightly overlaps with this idea but the ideas could be
combined. OP could accept a QueryExtractor and would then have some less stuff
to deal with internally since QueryExtractor extracts terms, automata,
PhraseHelper, etc. not OP.
> Expose OffsetsEnum directly from UnifiedHighlighter
> ---------------------------------------------------
>
> Key: LUCENE-8158
> URL: https://issues.apache.org/jira/browse/LUCENE-8158
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8158-alt.patch, LUCENE-8158.patch
>
>
> Now that OffestsEnum is unitary, we can expose it directly from the
> UnifiedHighlighter, allowing clients to use them in place of snippets.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]