[jira] [Updated] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

Esther Quansah (JIRA) Thu, 12 Nov 2015 09:05:34 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Esther Quansah updated SOLR-8212:
---------------------------------
    Description: 
Noticing some inconsistent behavior with the Standard Highlighter and its 
function on terms that use the NGram Tokenizer. Ex: 
I created a field called "title_contains" which uses the NGram Tokenizer and I 
indexed the term "bronchoscopy". Querying "co" on the title_contains field 
should return "bronchos<em>co</em>py", but the Standard highlighter returns 
"bronchoscopy" without the highlighting information.
I created a test called testNgram() which tests the above example using (1) the 
Standard Highlighter on the ngram field type and (2) the Fast Vector 
Highlighter on the ngram field type. The first fails and the second passes. 


Problem identified: MAX_NUM_TOKENS_PER_GROUP = 50 (in TokenGroup.Java) and for 
some terms numTokens >=50...this causes incorrect match start and end offsets 
and therefore no highlighting on found term. 

  was:
Noticing some inconsistent behavior with the Standard Highlighter and its 
function on terms that use the NGram Tokenizer. Ex: 
I created a field called "title_contains" which uses the NGram Tokenizer and I 
indexed the term "bronchoscopy". Querying "co" on the title_contains field 
should return "bronchos<em>co</em>py", but the Standard highlighter returns 
"bronchoscopy" without the highlighting information.
I created a test called testNgram() which tests the above example using (1) the 
Standard Highlighter on the ngram field type and (2) the Fast Vector 
Highlighter on the ngram field type. The first fails and the second passes. 


Problem identified: 


> Standard Highlighter Inconsistent with NGram Tokenizer
> ------------------------------------------------------
>
>                 Key: SOLR-8212
>                 URL: https://issues.apache.org/jira/browse/SOLR-8212
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Esther Quansah
>            Priority: Minor
>         Attachments: SOLR-8212.patch
>
>
> Noticing some inconsistent behavior with the Standard Highlighter and its 
> function on terms that use the NGram Tokenizer. Ex: 
> I created a field called "title_contains" which uses the NGram Tokenizer and 
> I indexed the term "bronchoscopy". Querying "co" on the title_contains field 
> should return "bronchos<em>co</em>py", but the Standard highlighter returns 
> "bronchoscopy" without the highlighting information.
> I created a test called testNgram() which tests the above example using (1) 
> the Standard Highlighter on the ngram field type and (2) the Fast Vector 
> Highlighter on the ngram field type. The first fails and the second passes. 
> Problem identified: MAX_NUM_TOKENS_PER_GROUP = 50 (in TokenGroup.Java) and 
> for some terms numTokens >=50...this causes incorrect match start and end 
> offsets and therefore no highlighting on found term. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

Reply via email to