[jira] [Commented] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

Esther Quansah (JIRA) Thu, 12 Nov 2015 09:04:45 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002416#comment-15002416
 ]


Esther Quansah commented on SOLR-8212:
--------------------------------------

Update: problem identified: in TokenGroup.java,  private static final int 
MAX_NUM_TOKENS_PER_GROUP = 50. Terms with query contained farther in word 
(bronchos*co*py, blood *ca*ncer, etc) end up having 50+ tokens and therefore 
private int matchStartOffset and private int matchEndOffset are not calculated 
correctly in void addToken() and entire term eventually returned with no 
formatting. 

> Standard Highlighter Inconsistent with NGram Tokenizer
> ------------------------------------------------------
>
>                 Key: SOLR-8212
>                 URL: https://issues.apache.org/jira/browse/SOLR-8212
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Esther Quansah
>            Priority: Minor
>         Attachments: SOLR-8212.patch
>
>
> Noticing some inconsistent behavior with the Standard Highlighter and its 
> function on terms that use the NGram Tokenizer. Ex: 
> I created a field called "title_contains" which uses the NGram Tokenizer and 
> I indexed the term "bronchoscopy". Querying "co" on the title_contains field 
> should return "bronchos<em>co</em>py", but the Standard highlighter returns 
> "bronchoscopy" without the highlighting information.
> I created a test called testNgram() which tests the above example using (1) 
> the Standard Highlighter on the ngram field type and (2) the Fast Vector 
> Highlighter on the ngram field type. The first fails and the second passes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

Reply via email to