[
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469934#comment-13469934
]
Lance Norskog commented on LUCENE-3921:
---------------------------------------
I have discovered a similar problem with the Smart Chinese toolkit. Would the
same approach work for both languages? Would it be worth solving this problem
with a generic tool rather than language-specific?
> Add decompose compound Japanese Katakana token capability to Kuromoji
> ---------------------------------------------------------------------
>
> Key: LUCENE-3921
> URL: https://issues.apache.org/jira/browse/LUCENE-3921
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Affects Versions: 4.0-ALPHA
> Environment: Cent OS 5, IPA Dictionary, Run with "Search mdoe"
> Reporter: Kazuaki Hiraga
> Labels: features
>
> Japanese morphological analyzer, Kuromoji doesn't have a capability to
> decompose every Japanese Katakana compound tokens to sub-tokens. It seems
> that some Katakana tokens can be decomposed, but it cannot be applied every
> Katakana compound tokens. For instance, "トートバッグ(tote bag)" and "ショルダーバッグ"
> don't decompose into "トート バッグ" and "ショルダー バッグ" although the IPA dictionary
> has "バッグ" in its entry. I would like to apply the decompose feature to every
> Katakana tokens if the sub-tokens are in the dictionary or add the capability
> to force apply the decompose feature to every Katakana tokens.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]