[
https://issues.apache.org/jira/browse/LUCENE-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll resolved LUCENE-185.
------------------------------------
Resolution: Won't Fix
Assignee: (was: Lucene Developers)
There is a Thai analysis contribution in contrib/analysis that appears to have
taken a similar approach.
> [PATCH] Thai Analysis Enhancement
> ---------------------------------
>
> Key: LUCENE-185
> URL: https://issues.apache.org/jira/browse/LUCENE-185
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: unspecified
> Environment: Operating System: All
> Platform: All
> Reporter: Pichai Ongvasith
> Priority: Minor
> Attachments: thai_analyzer.zip
>
>
> Unlike other languages, Thai do not have a clear word boundary within a
> sentence. Words are written consecutively without a delimiter. The Lucene
> StandardTokenizer currently cannot tokenize a Thai sentence and returns the
> whole sentence as a token. A special tokenizer to break Thai sentences into
> words is required.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]