[
https://issues.apache.org/jira/browse/SOLR-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154098#comment-13154098
]
Koji Sekiguchi commented on SOLR-2845:
--------------------------------------
I can reproduce what Ajay said. Looks like new SynonymFilter problem? Because
if I set LUCENE_33 (in order to use SlowSynonymFilter), it works...
> Adding extra highlighting term to a synonym
> -------------------------------------------
>
> Key: SOLR-2845
> URL: https://issues.apache.org/jira/browse/SOLR-2845
> Project: Solr
> Issue Type: Bug
> Components: highlighter
> Affects Versions: 3.4
> Environment: Solr release: 3.4.0
> JVM:
> java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> OS: 2.6.18-274.el5 #1 SMP Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64
> GNU/Linux
> Reporter: Ajay Kanduru
> Fix For: 3.4
>
>
> I notice a strange highlighting behaviour while highlighting a synonym term.
> It is in 3.4.0 release. This is working fine in 1.4.1. Using solr example
> core, here are the steps to reproduce the problem.
> 1) In *schema.xml*, change text_general fieldtype definition to use synonym
> filter at index time and remove the filter from query analysis.
> {code:xml}
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/> -->
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> {code}
>
> 2) Define a new field 'test_field1'.
> {code:xml}
> <field name="test_field1" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
> {code}
> 3) Copy this to 'text' field.
> {code:xml}
> <copyField source="test_field1" dest="text"/>
> {code}
> 4) In *exampledocs/ipod_video.xml*, add a new field to the doc.
> {code:xml}
> <field name="test_field1">Heart Failure</field>
> {code}
> 5) In *solr/conf/index_synonyms.txt:*, add the following line (all in one
> line).
> {noformat}
> heart failure, failure\, heart, cardiac failure, cardiac insufficiency,
> failure heart, failure\, cardiac, heart failure (nos), insufficiency cardiac,
> insufficiency\, cardiac, hf - heart failure
> {noformat}
> 6) Reindex exampledocs/*xml files and run the following URL.
> http://localhost:8983/solr/select?q=heart&indent=on&hl=on&hl.fl=*
> This is what I get from highlighting tag.
> {code:xml}
> <lst name="highlighting">
> <lst name="MA147LL/A">
> <arr name="test_field1">
> <str><em>Heart</em><em>Heart
> Failure</em></str>
> </arr>
> </lst>
> </lst>
> {code}
> The actual value of the field is *Heart Failure*. It is changed to
> *Heart**Heart Failure*.
> Apparently the synonym entries has something to do with the problem. The
> above synonym terms are the minimum extraction from a larger line to
> reproduce the problem. Notice that there is a hyphen in the last term. If I
> remove the hyphen, it works, even with larger line of entries. Keeping the
> hyphen, and removing *insufficiency\, cardiac*, also works. So the length of
> the line and hyphen both seem at play here.
> Using large and complicated synonyms is very important to our application.
> 3.4 release has announced some major improvements to memory foot print and
> performance for synonym filter. For this reason we are eager to move to
> 3.4.0, but this problem is a show stopper for us. I will appreciate any
> suggestions for a work around or a quick fix to the problem.
> Regards,
> -Ajay
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]