[ http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12413695 ]
Arthit Suriyawongkul commented on LUCENE-503: --------------------------------------------- related projects/implementations: SansarnLook based on Lucene, with additional ThaiAnalyzer ref: http://sansarn.com/look/technique/ file: http://sansarn.com/look/download/ Pichai Ongvasith's ThaiAnalyzer ref: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200402.mbox/[EMAIL PROTECTED] file: http://pichai.netfirms.com/thai_analyzer.zip > Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene > --------------------------------------------------------------- > > Key: LUCENE-503 > URL: http://issues.apache.org/jira/browse/LUCENE-503 > Project: Lucene - Java > Type: New Feature > Components: Analysis > Versions: 1.4 > Reporter: Samphan Raruenrom > Attachments: ThaiAnalyzer.java, ThaiWordFilter.java > > Thai text don't have space between words. Usually, a dictionary-based > algorithm is used to break string into words. For Lucene to be usable for > Thai, an Analyzer that know how to break Thai words is needed. > I've implemented such Analyzer, ThaiAnalyzer, using ICU4j > DictionaryBasedBreakIterator for word breaking. I'll upload the code later. > I'm normally a C++ programmer and very new to Java. Please review the code > for any problem. One possible problem is that it requires ICU4j. I don't know > whether this is OK. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]