[jira] Created: (LUCENE-1581) LowerCaseFilter should be able to be configured to use a specific locale.

Digy (JIRA) Sat, 28 Mar 2009 17:15:14 -0700

LowerCaseFilter should be able to be configured to use a specific locale.
-------------------------------------------------------------------------


                 Key: LUCENE-1581
                 URL: https://issues.apache.org/jira/browse/LUCENE-1581
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Digy


//Since I am a .Net programmer, Sample codes will be in c# but I don't think 
that it would be a problem to understand them.
//

Assume an input text like "İ" and and analyzer like below
{code}
        public class SomeAnalyzer : Analyzer
        {
                public override TokenStream TokenStream(string fieldName, 
System.IO.TextReader reader)
                {
                        TokenStream t = new SomeTokenizer(reader);
                        t = new Lucene.Net.Analysis.ASCIIFoldingFilter(t);
                        t = new LowerCaseFilter(t);
                        return t;
                }
        
        }
{code}
        

ASCIIFoldingFilter will return "I" and after, LowerCaseFilter will return
        "i" (if locale is "en-US") 
        or 
        "ı' if(locale is "tr-TR") (that means,this token should be input to 
another instance of ASCIIFoldingFilter)



So, calling LowerCaseFilter before ASCIIFoldingFilter would be a solution, but 
a better approach can be adding
a new constructor to LowerCaseFilter and forcing it to use a specific locale.
{code}
    public sealed class LowerCaseFilter : TokenFilter
    {
        /* +++ */System.Globalization.CultureInfo CultureInfo = 
System.Globalization.CultureInfo.CurrentCulture;

        public LowerCaseFilter(TokenStream in) : base(in)
        {
        }

        /* +++ */  public LowerCaseFilter(TokenStream in, 
System.Globalization.CultureInfo CultureInfo) : base(in)
        /* +++ */  {
        /* +++ */      this.CultureInfo = CultureInfo;
        /* +++ */  }
                
        public override Token Next(Token result)
        {
            result = Input.Next(result);
            if (result != null)
            {

                char[] buffer = result.TermBuffer();
                int length = result.termLength;
                for (int i = 0; i < length; i++)
                    /* +++ */ buffer[i] = 
System.Char.ToLower(buffer[i],CultureInfo);

                return result;
            }
            else
                return null;
        }
    }
{code}

DIGY

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1581) LowerCaseFilter should be able to be configured to use a specific locale.

Reply via email to