[jira] [Commented] (SOLR-2845) Adding extra highlighting term to a synonym

Koji Sekiguchi (Commented) (JIRA) Mon, 21 Nov 2011 02:21:21 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154098#comment-13154098
 ]


Koji Sekiguchi commented on SOLR-2845:
--------------------------------------

I can reproduce what Ajay said. Looks like new SynonymFilter problem? Because 
if I set LUCENE_33 (in order to use SlowSynonymFilter), it works...
                
> Adding extra highlighting term to a synonym
> -------------------------------------------
>
>                 Key: SOLR-2845
>                 URL: https://issues.apache.org/jira/browse/SOLR-2845
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 3.4
>         Environment: Solr release: 3.4.0
> JVM:
> java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> OS: 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 
> GNU/Linux
>            Reporter: Ajay Kanduru
>             Fix For: 3.4
>
>
> I notice a strange highlighting behaviour while highlighting a synonym term. 
> It is in 3.4.0 release. This is working fine in 1.4.1. Using solr example 
> core, here are the steps to reproduce the problem. 
> 1) In *schema.xml*, change text_general fieldtype definition to use synonym 
> filter at index time and remove the filter from query analysis.
> {code:xml}
> <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100">
>   <analyzer type="index">
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt" enablePositionIncrements="true" />
>     <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
> ignoreCase="true" expand="true"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>   </analyzer>
>   <analyzer type="query">
>     <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt" enablePositionIncrements="true" />
>     <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true"/> -->
>     <filter class="solr.LowerCaseFilterFactory"/>
>   </analyzer>
> </fieldType>
> {code}
>    
> 2) Define a new field 'test_field1'.
> {code:xml}
>   <field name="test_field1" type="text_general" indexed="true" stored="true" 
> multiValued="true"/>
> {code}
> 3) Copy this to 'text' field.
> {code:xml}
>   <copyField source="test_field1" dest="text"/>
> {code}
> 4) In *exampledocs/ipod_video.xml*, add a new field to the doc.
> {code:xml}
>   <field name="test_field1">Heart Failure</field>
> {code}
> 5) In *solr/conf/index_synonyms.txt:*, add the following line (all in one 
> line).
> {noformat}
> heart failure, failure\, heart, cardiac failure, cardiac insufficiency, 
> failure heart, failure\, cardiac, heart failure (nos), insufficiency cardiac, 
> insufficiency\, cardiac, hf - heart failure
> {noformat}
> 6) Reindex exampledocs/*xml files and run the following URL.
>   http://localhost:8983/solr/select?q=heart&indent=on&hl=on&hl.fl=*
> This is what I get from highlighting tag.
> {code:xml}
>   <lst name="highlighting">
>     <lst name="MA147LL/A">
>       <arr name="test_field1">
>         <str>&lt;em&gt;Heart&lt;/em&gt;&lt;em&gt;Heart 
> Failure&lt;/em&gt;</str>
>       </arr>
>     </lst>
>   </lst>
> {code}
> The actual value of the field is *Heart Failure*. It is changed to 
> *Heart**Heart Failure*.
> Apparently the synonym entries has something to do with the problem. The 
> above synonym terms are the minimum extraction from a larger line to 
> reproduce the problem. Notice that there is a hyphen in the last term. If I 
> remove the hyphen, it works, even with larger line of entries. Keeping the 
> hyphen, and removing *insufficiency\, cardiac*, also works. So the length of 
> the line and hyphen both seem at play here.
> Using large and complicated synonyms is very important to our application. 
> 3.4 release has announced some major improvements to memory foot print and 
> performance for synonym filter. For this reason we are eager to move to 
> 3.4.0, but this problem is a show stopper for us. I will appreciate any 
> suggestions for a work around or a quick fix to the problem.
> Regards,
> -Ajay

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2845) Adding extra highlighting term to a synonym

Reply via email to