[ 
https://issues.apache.org/jira/browse/SOLR-11516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212919#comment-16212919
 ] 

Tim Retout commented on SOLR-11516:
-----------------------------------

[~dsmiley] yes, that makes sense. Thanks for the quick reply.

The domain I'm working in (recruitment) is quite similar to a general purpose 
search engine - we have documents of maybe 1000 words, and need to show the 
gist of where the matches appear.  We are happy with cutting off in the middle 
of a sentence, because well-known search engines do it.

When using hl.bs.type=SENTENCE, I have run into examples where the surrounding 
sentences were not pulled in within the fragsize that we had set - 
unfortunately I can't show a quick example of this on the techproducts 
collection, but I can confirm this (and file as a separate issue?) if needed. 
It was something like:

    "Foo bar baz. Very long sentence starts here that goes on for several 
hundred chars."

Then a search for "foo" would bring back as a snippet:

    "<em>Foo</em> bar baz."

This led to very short summaries of the document, where only one or two short 
"sentences" are provided that match the query, and the total summary was less 
than one line long.

What I was hoping for was a way to use the unified highlighter to produce 
similar summaries to the other highlighter options (i.e. cutting off at word 
boundaries, I think I mean), to take advantage of the performance and 
flexibility advantages described in the documentation.

> Unified highlighter with word separator never gives context to the left
> -----------------------------------------------------------------------
>
>                 Key: SOLR-11516
>                 URL: https://issues.apache.org/jira/browse/SOLR-11516
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 6.4, 7.1
>            Reporter: Tim Retout
>
> When using the unified highlighter with hl.bs.type=WORD, I am not able to get 
> context to the left of the matches returned; only words to the right of each 
> match are shown.  I see this behaviour on both Solr 6.4 and Solr 7.1.
> Without context to the left of a match, the highlighted snippets are much 
> less useful for understanding where the match appears in a document.
> As an example, using the techproducts data with Solr 7.1, given a search for 
> "apple", highlighting the "features" field:
> http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified
> I see this snippet:
> "<em>Apple</em> Lossless, H.264 video"
> Note that "Apple" is anchored to the left.  Compare with the original 
> highlighter:
> http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30
> And the match has context either side:
> ", Audible, <em>Apple</em> Lossless, H.264 video"
> (To complicate this, in general I am not sure that the unified highlighter is 
> respecting the hl.fragsize parameter, although [SOLR-9935] suggests support 
> was added.  I included the hl.fragsize param in the unified URL too, but it's 
> making no difference unless set to 0.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to