[ 
https://issues.apache.org/jira/browse/LUCENE-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pavithra kariyawasam updated LUCENE-9043:
-----------------------------------------
    Status: Patch Available  (was: Open)

> Currently Lucene doesn't have an analyzer for Sinhala. We have built analyzer 
> which consist of language dependent tokenizer, stemming algorithm and list of 
> stop words.
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9043
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9043
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 8.3
>            Reporter: pavithra kariyawasam
>            Priority: Major
>             Fix For: 5.5.6
>
>         Attachments: SinhalaAnalyzer.java, SinhalaStemmer.java, 
> SinhalaTokenizer.java, stopwords.txt
>
>
> This component is developed based on three main researches. 
>  Sinhala Analyzer, as it word implies it is an enhanced software library to 
> analyze documents which are written in Sinhala language. Sinhala Analyzer has 
> implemented by performing Sinhala morphological analysis. Tokenizing the 
> document content precisely, Removing stopwords accordingly and converting the 
> terms to its base/root form accurately are the main three functionalities of 
> Sinhala Analyzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to