Highlighter does not work with HTML content that's passed through 
HTMLStrip*Tokenizer
-------------------------------------------------------------------------------------

                 Key: SOLR-57
                 URL: http://issues.apache.org/jira/browse/SOLR-57
             Project: Solr
          Issue Type: Bug
          Components: search
         Environment: Red Hat Linux 9, Tomcat 5.5.20
            Reporter: Ho Yin Au
            Priority: Minor


I have a fieldtype with the following definition:
        <fieldtype name="htmltext"  class="solr.TextField" 
positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
                <filter class="solr.StandardFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.StopFilterFactory" />
                <filter class="solr.EnglishPorterFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                <filter class="solr.ISOLatin1AccentFilterFactory" />
            </analyzer>
        </fieldtype>

When fields with that definition are included in the list of fields to be 
highlighted, the highlighted term is always offset because it does not take 
into account the HTML tags before it, so you end up with something like this 
for the highlighted snipplet:

Does your comptuer meet the <a 
href="http:/<em>/www.example</em>.com/system_requirements.shtml">minimum system 
requirements</a>?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to