Add decompose compound Japanese Katakana token capability to Kuromoji
---------------------------------------------------------------------
Key: LUCENE-3921
URL: https://issues.apache.org/jira/browse/LUCENE-3921
Project: Lucene - Java
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 4.0
Environment: Cent OS 5, IPA Dictionary
Reporter: Kazuaki Hiraga
Japanese morphological analyzer, Kuromoji doesn't have a capability to
decompose every Japanese Katakana compound tokens to sub-tokens. It seems that
some Katakana tokens can be decomposed, but it cannot be applied every Katakana
compound tokens. For instance, "トートバッグ(tote bag)" and "ショルダーバッグ" don't
decompose into "トート バッグ" and "ショルダー バッグ" although the IPA dictionary has "バッグ"
in its entry. I would like to apply the decompose feature to every Katakana
tokens if the sub-tokens are in the dictionary or add the capability to force
apply the decompose feature to every Katakana tokens.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]