[ 
https://issues.apache.org/jira/browse/LUCENE-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352779#comment-16352779
 ] 

David Smiley commented on LUCENE-8158:
--------------------------------------

This looks interesting Alan.  I looked at both patches.  Lets consider the more 
invasive one.  I like that UnifiedHighlighter is shorter as it was getting 
quite long, and this looks like a reasonable way to break it up.
* These methods were on UnifiedHighlighter (and not other internal things) 
deliberately: requiresRewrite, preSpanQueryRewrite, preMultiTermQueryRewrite, 
shouldHandleMultiTermQuery, shouldHighlightPhrasesStrictly, 
shouldPreferPassageRelevancyOverSpeed.  By having them on the UH, we can get a 
method reference and use them internally where needed without exposing the 
internals to the user.  PhraseHelper is an internal thing, and I believe the 
OffsetProducer abstraction you are introducing here is too.  That said, we 
could have fewer such methods:
** I'd prefer that we do away with those "should" methods that merely examine 
defaultFieldFlags because by creating a second method to customize the settings 
(separate from getFlags(field) it's both extra/unnecessary plus if you override 
one of those should methods it may in fact do nothing if you have overridden 
getFlags as well.  So it's error-prone.  I'm okay with these disappearing in 
7.x; users merely need to switch to overriding {{getFlags(String field)}}.
** I think preSpanQueryRewrite and preMultiTermQueryRewrite could be combined 
in purpose to simply be customRewrite or perhaps worded as customizeQuery 
avoiding the overloaded word "rewrite".  Since such hooks are advanced, I'm 
okay with the API changing in 7.x.
* OffsetProducer.getFieldOffsets(field,text) is unused and has no javadocs; do 
you want to add it to the visibility test?  Also, it's implementation appears 
to duplicate that of getFieldOffsetStrategy

BTW I've been thinking of a refactoring similar to your proposal here.  It 
would be nice if there was a class that held the Query and the things we 
extract from it: position-insensitive terms (or all terms), MTQ automata, 
position-sensitive queries.  It's job would be to extract all this information 
from the Query, and that's it.  Today it has these aforementioned things but it 
would be a nice option to extract the term->boost as well so that we can 
customize the relevancy based on boosts in the query (like the original 
Highlighter & FVH can).  Eventually it would be nice to "visit" the query tree 
once directly in this proposed class instead of several times 
(weight.extractTerms, MultiTermHighlighting, PhraseHelper).  That would be the 
extent of the purpose of this thingamajig -- perhaps called QueryExtractor.  An 
instance of this class would go on each FieldOffsetStrategy and would obviate 
the need to pass so many arguments to these classes and for them to have as 
many fields as they do (not that there's _that_ much but still).  
OffsetProducer here slightly overlaps with this idea but the ideas could be 
combined.  OP could accept a QueryExtractor and would then have some less stuff 
to deal with internally since QueryExtractor extracts terms, automata, 
PhraseHelper, etc. not OP.

> Expose OffsetsEnum directly from UnifiedHighlighter
> ---------------------------------------------------
>
>                 Key: LUCENE-8158
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8158
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8158-alt.patch, LUCENE-8158.patch
>
>
> Now that OffestsEnum is unitary, we can expose it directly from the 
> UnifiedHighlighter, allowing clients to use them in place of snippets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to