[ https://issues.apache.org/jira/browse/LUCENE-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998660#comment-16998660 ]
Nándor Mátravölgyi commented on LUCENE-9093: -------------------------------------------- I'm back with a patch! [^LUCENE-9093.patch] This adds a `hl.fragalign` parameter to the Unified Highlighter. I've added a description about it in the docs on how it works. I've also updated the related tests. I've opted to keep the new feature backward-compatible. From the new docs: {noformat} Fragment alignment can influence where the match in a passage is positioned. This floating point value is used to break the remaining `hl.fragsize` of the passage around the match. The default value of `0.0` means to align the match to the left, this is the backward-compatible setting. A value of `0.5` would mean that equal amount of text should be around the match on both sides, while `1.0` to align it to the right. Note: there are situations where the requested alignment is not plausible. This depends on the length of the match, the used breakiterator and the text content around the match. Before the introduction of this parameter all passages had left-aligned matches. Changing the `hl.bs.type` to `WORD` and the `hl.fragalign` to `0.5` will make results that closely resemble what the other highlighters produce by default. {noformat} I must say that I've changed my mind about the abstraction. A proper one instead of the chained BreakIterators would be much nicer. The LengthGoalBreakIterator already had a few behavioral differences to how a generic BreakIterator works. This change makes it work even less like a BreakIterator. It should be totally fine in it's specifically crafted universe. However a better abstraction/structure would be required if we want style-points as well. The difficulty is that the chaining of the BreakItartors would need a refactor which has far greater scope than this issue for example. > Unified highlighter with word separator never gives context to the left > ----------------------------------------------------------------------- > > Key: LUCENE-9093 > URL: https://issues.apache.org/jira/browse/LUCENE-9093 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: Tim Retout > Priority: Major > Attachments: LUCENE-9093.patch > > > When using the unified highlighter with hl.bs.type=WORD, I am not able to get > context to the left of the matches returned; only words to the right of each > match are shown. I see this behaviour on both Solr 6.4 and Solr 7.1. > Without context to the left of a match, the highlighted snippets are much > less useful for understanding where the match appears in a document. > As an example, using the techproducts data with Solr 7.1, given a search for > "apple", highlighting the "features" field: > http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified > I see this snippet: > "<em>Apple</em> Lossless, H.264 video" > Note that "Apple" is anchored to the left. Compare with the original > highlighter: > http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30 > And the match has context either side: > ", Audible, <em>Apple</em> Lossless, H.264 video" > (To complicate this, in general I am not sure that the unified highlighter is > respecting the hl.fragsize parameter, although [SOLR-9935] suggests support > was added. I included the hl.fragsize param in the unified URL too, but it's > making no difference unless set to 0.) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org