pavithra kariyawasam created LUCENE-9043:
--------------------------------------------

             Summary: Currently Lucene doesn't have an analyzer for Sinhala. We 
have built analyzer which consist of language dependent tokenizer, stemming 
algorithm and list of stop words.
                 Key: LUCENE-9043
                 URL: https://issues.apache.org/jira/browse/LUCENE-9043
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 8.3
            Reporter: pavithra kariyawasam
             Fix For: 5.5.6
         Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
SinhalaTokenizer.java, stopwords.txt

This component is developed based on three main researches. 


 Sinhala Analyzer, as it word implies it is an enhanced software library to 
analyze documents which are written in Sinhala language. Sinhala Analyzer has 
implemented by performing Sinhala morphological analysis. Tokenizing the 
document content precisely, Removing stopwords accordingly and converting the 
terms to its base/root form accurately are the main three functionalities of 
Sinhala Analyzer.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to