Federico Grillini created SOLR-12808:
----------------------------------------

             Summary: Wrong highlighting using PatternReplaceCharFilterFactory
                 Key: SOLR-12808
                 URL: https://issues.apache.org/jira/browse/SOLR-12808
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: highlighter
    Affects Versions: 7.5, 7.4, 7.2.1
         Environment: Java: Oracle Corporation Java HotSpot(TM) 64-Bit Server 
VM 1.8.0_162 25.162-b12
OS: Linux Debian 8.11
            Reporter: Federico Grillini
         Attachments: text_analysis.png

Hi,
the default highlighter seems to work badly in conjunction with 
PatternReplaceCharFilterFactory.

My query is: {{verb_esame_num_tnv:(00031665 0035 9)}}

The field type used by the field "verb_esame_num_tnv" is:

{code:xml}
<fieldType name="text_num_verbale" class="solr.TextField" 
positionIncrementGap="100">
   <analyzer>
      <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="^0*([0-9]+\s+[0-9]+\s+[0-9]+)$" replacement=" $1"/>
      <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" 
replacement=" "/>
      <tokenizer class="solr.StandardTokenizerFactory"/>
   </analyzer>
</fieldType>
{code}

I've attached a screenshot of the text analysis.

It seems that the highlighter uses the wrong offsets in the original text to 
highligth the matched tokens.

Hope this helps.

Regards.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to