[
https://issues.apache.org/jira/browse/LUCENE-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674263#comment-15674263
]
David Smiley commented on LUCENE-7565:
--------------------------------------
The UH solely breaks according to a java.text.BreakIterator. Perhaps the most
straight-forward way to do this is to add a new BreakIterator. Other ways
would probably require a larger refactoring, esp. considering how multi-valued
fields are highlighted with SplittingBreakIterator. The B.I. abstraction isn't
great but it suffices for the highlighter, and it can suffice for this use-case
provided this B.I. impl makes some assumptions as to how the UH calls it's
methods. A new BI could wrap a target BI (_that_ BI would typically be a
standard "word" impl but needn't be). When bi.following(offset) is invoked
(which is called by the UH at the start of the passage to find the end of the
passage), it can examine the current position (the start) and consider the
configured character target length and then use the underlying breakIterator,
likely calling following() then previous().
I was just thinking... an alternative way to think about delineating passages
is by having the highlighted words not exceed X words in-between in a given
passage. That would be an interesting approach. Quite separate from this
issue though!
> UnifiedHighlighter: add ability to delineate passes by max char size
> --------------------------------------------------------------------
>
> Key: LUCENE-7565
> URL: https://issues.apache.org/jira/browse/LUCENE-7565
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: David Smiley
>
> The Highlighter and FastVectorHighlighter can be configured to delineate
> passages using a target character length, that is then typically adjusted for
> the word boundary. This would be a good option to add to the
> UnifiedHighlighter (UH) in it's own right, as well as for better back-wards
> compatibility to those using this highlighter.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]