[
https://issues.apache.org/jira/browse/SOLR-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002416#comment-15002416
]
Esther Quansah commented on SOLR-8212:
--------------------------------------
Update: problem identified: in TokenGroup.java, private static final int
MAX_NUM_TOKENS_PER_GROUP = 50. Terms with query contained farther in word
(bronchos*co*py, blood *ca*ncer, etc) end up having 50+ tokens and therefore
private int matchStartOffset and private int matchEndOffset are not calculated
correctly in void addToken() and entire term eventually returned with no
formatting.
> Standard Highlighter Inconsistent with NGram Tokenizer
> ------------------------------------------------------
>
> Key: SOLR-8212
> URL: https://issues.apache.org/jira/browse/SOLR-8212
> Project: Solr
> Issue Type: Bug
> Reporter: Esther Quansah
> Priority: Minor
> Attachments: SOLR-8212.patch
>
>
> Noticing some inconsistent behavior with the Standard Highlighter and its
> function on terms that use the NGram Tokenizer. Ex:
> I created a field called "title_contains" which uses the NGram Tokenizer and
> I indexed the term "bronchoscopy". Querying "co" on the title_contains field
> should return "bronchos<em>co</em>py", but the Standard highlighter returns
> "bronchoscopy" without the highlighting information.
> I created a test called testNgram() which tests the above example using (1)
> the Standard Highlighter on the ngram field type and (2) the Fast Vector
> Highlighter on the ngram field type. The first fails and the second passes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]