[
https://issues.apache.org/jira/browse/SOLR-11516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213063#comment-16213063
]
David Smiley commented on SOLR-11516:
-------------------------------------
Ok. At least with respect to the issue title... this issue can probably be
won't-fix or perhaps outright remove WORD & CHARACTER options since they are
bad options. (FWIW I didn't don't recall including them; they were probably
inherited options from similar code for the FVH).
Can you try simply increasing the hl.fragsize a bunch more? And then if the
result is too long then trimming client-side?
FWIW there is an already coded option on the Lucene end of this to have
hl.fragsize be a target/average such that the snippet will break on the side
closest to the target (either ahead or before). There is no Solr option to
enable this; it's a TODO. The current setting picks the earliest always, even
if the next break is only a word beyond the target.
Snippeting is hard to satisfy everyone with. There are many ways to skin this
cat.
> Unified highlighter with word separator never gives context to the left
> -----------------------------------------------------------------------
>
> Key: SOLR-11516
> URL: https://issues.apache.org/jira/browse/SOLR-11516
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: highlighter
> Affects Versions: 6.4, 7.1
> Reporter: Tim Retout
>
> When using the unified highlighter with hl.bs.type=WORD, I am not able to get
> context to the left of the matches returned; only words to the right of each
> match are shown. I see this behaviour on both Solr 6.4 and Solr 7.1.
> Without context to the left of a match, the highlighted snippets are much
> less useful for understanding where the match appears in a document.
> As an example, using the techproducts data with Solr 7.1, given a search for
> "apple", highlighting the "features" field:
> http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified
> I see this snippet:
> "<em>Apple</em> Lossless, H.264 video"
> Note that "Apple" is anchored to the left. Compare with the original
> highlighter:
> http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30
> And the match has context either side:
> ", Audible, <em>Apple</em> Lossless, H.264 video"
> (To complicate this, in general I am not sure that the unified highlighter is
> respecting the hl.fragsize parameter, although [SOLR-9935] suggests support
> was added. I included the hl.fragsize param in the unified URL too, but it's
> making no difference unless set to 0.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]