[jira] [Updated] (LUCENE-7704) SysnonymGraphFilter doesn't respect ignoreCase parameter

Michael McCandless (JIRA) Thu, 23 Feb 2017 04:20:07 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-7704:
---------------------------------------
    Attachment: LUCENE-7704.patch

Hi [~syonekura], actually, this is by design: it is up to you to downcase the 
rules you add to the {{SynonymMap.Builder}}, and then that {{ignoreCase}} 
option will ignore the case of the incoming tokens during analysis.

I'm sorry the javadocs were missing (so you would not have known this is by 
design!!), so I've copied over the javadocs from the old {{SynonymFilter}}, and 
I've fixed your test case to down-case the rules, and now it's passing, in the 
attached patch.

> SysnonymGraphFilter doesn't respect ignoreCase parameter
> --------------------------------------------------------
>
>                 Key: LUCENE-7704
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7704
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 6.4.1
>            Reporter: Sebastian Yonekura Baeza
>            Priority: Minor
>         Attachments: LUCENE-7704.patch
>
>
> Hi, it seems that SynonymGraphFilter doesn't respect ignoreCase parameter. In 
> particular this test doesn't pass:
> {code:title=UppercaseSynonymMapTest.java|borderStyle=solid}
> package com.mapcity.suggest.lucene;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.Tokenizer;
> import org.apache.lucene.analysis.core.WhitespaceTokenizer;
> import org.apache.lucene.analysis.synonym.SynonymGraphFilter;
> import org.apache.lucene.analysis.synonym.SynonymMap;
> import org.apache.lucene.util.CharsRef;
> import org.apache.lucene.util.CharsRefBuilder;
> import org.junit.Test;
> import java.io.IOException;
> import static 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents;
> /**
>  * @author Sebastian Yonekura
>  *         Created on 22-02-17
>  */
> public class UppercaseSynonymMapTest {
>     @Test
>     public void analyzerTest01() throws IOException {
>         // This passes
>         testAssertMapping("word", "synonym");
>         // this one not
>         testAssertMapping("word".toUpperCase(), "synonym");
>     }
>     private void testAssertMapping(String inputString, String outputString) 
> throws IOException {
>         SynonymMap.Builder builder = new SynonymMap.Builder(false);
>         CharsRef input = SynonymMap.Builder.join(inputString.split(" "), new 
> CharsRefBuilder());
>         CharsRef output = SynonymMap.Builder.join(outputString.split(" "), 
> new CharsRefBuilder());
>         builder.add(input, output, true);
>         Analyzer analyzer = new CustomAnalyzer(builder.build());
>         TokenStream tokenStream = analyzer.tokenStream("field", inputString);
>         assertTokenStreamContents(tokenStream, new String[]{
>                 outputString, inputString
>         });
>     }
>     static class CustomAnalyzer extends Analyzer {
>         private SynonymMap synonymMap;
>         CustomAnalyzer(SynonymMap synonymMap) {
>             this.synonymMap = synonymMap;
>         }
>         @Override
>         protected TokenStreamComponents createComponents(String s) {
>             Tokenizer tokenizer = new WhitespaceTokenizer();
>             TokenStream tokenStream = new SynonymGraphFilter(tokenizer, 
> synonymMap, true); // Ignore case True
>             return new TokenStreamComponents(tokenizer, tokenStream);
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7704) SysnonymGraphFilter doesn't respect ignoreCase parameter

Reply via email to