[ 
https://issues.apache.org/jira/browse/SOLR-12808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628521#comment-16628521
 ] 

Federico Grillini commented on SOLR-12808:
------------------------------------------

I've inserted this bug because the official documentation says:

{quote}
CharFilters can be chained like Token Filters and placed in front of a 
Tokenizer. CharFilters can add, change, or remove characters while preserving 
the original character offsets to support features like highlighting.
{quote}

> Wrong highlighting using PatternReplaceCharFilterFactory
> --------------------------------------------------------
>
>                 Key: SOLR-12808
>                 URL: https://issues.apache.org/jira/browse/SOLR-12808
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 7.2.1, 7.4, 7.5
>         Environment: Java: Oracle Corporation Java HotSpot(TM) 64-Bit Server 
> VM 1.8.0_162 25.162-b12
> OS: Linux Debian 8.11
>            Reporter: Federico Grillini
>            Priority: Major
>         Attachments: text_analysis.png
>
>
> Hi,
> the default highlighter seems to work badly in conjunction with 
> PatternReplaceCharFilterFactory.
> My query is: {{verb_esame_num_tnv:(00031665 0035 9)}}
> The field type used by the field "verb_esame_num_tnv" is:
> {code:xml}
> <fieldType name="text_num_verbale" class="solr.TextField" 
> positionIncrementGap="100">
>    <analyzer>
>       <charFilter class="solr.PatternReplaceCharFilterFactory" 
> pattern="^0*([0-9]+\s+[0-9]+\s+[0-9]+)$" replacement=" $1"/>
>       <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" 
> replacement=" "/>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>    </analyzer>
> </fieldType>
> {code}
> I've attached a screenshot of the text analysis.
> It seems that the highlighter uses the wrong offsets in the original text to 
> highligth the matched tokens.
> Hope this helps.
> Regards.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to