[jira] [Commented] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

David Smiley (JIRA) Thu, 12 Nov 2015 09:18:19 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002447#comment-15002447
 ]


David Smiley commented on SOLR-8212:
------------------------------------

Do the Postings or FastVector highlighters work properly for you?  I know they 
don't have this specific deficiency but I'm wondering if they highlight NGram 
based analysis the same way as the Standard highlighter.
https://cwiki.apache.org/confluence/display/solr/Highlighting
note that postings highlighter effectively only supports 
{{hl.usePhraseHighlighter=false}} at this time.

> Standard Highlighter Inconsistent with NGram Tokenizer
> ------------------------------------------------------
>
>                 Key: SOLR-8212
>                 URL: https://issues.apache.org/jira/browse/SOLR-8212
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Esther Quansah
>            Priority: Minor
>         Attachments: SOLR-8212.patch
>
>
> Noticing some inconsistent behavior with the Standard Highlighter and its 
> function on terms that use the NGram Tokenizer. Ex: 
> I created a field called "title_contains" which uses the NGram Tokenizer and 
> I indexed the term "bronchoscopy". Querying "co" on the title_contains field 
> should return "bronchos<em>co</em>py", but the Standard highlighter returns 
> "bronchoscopy" without the highlighting information.
> I created a test called testNgram() which tests the above example using (1) 
> the Standard Highlighter on the ngram field type and (2) the Fast Vector 
> Highlighter on the ngram field type. The first fails and the second passes. 
> Problem identified: MAX_NUM_TOKENS_PER_GROUP = 50 (in TokenGroup.Java) and 
> for some terms numTokens >=50...this causes incorrect match start and end 
> offsets and therefore no highlighting on found term. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

Reply via email to