[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469934#comment-13469934
 ] 

Lance Norskog commented on LUCENE-3921:
---------------------------------------

I have discovered a similar problem with the Smart Chinese toolkit. Would the 
same approach work for both languages? Would it be worth solving this problem 
with a generic tool rather than language-specific?
                
> Add decompose compound Japanese Katakana token capability to Kuromoji
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-3921
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3921
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.0-ALPHA
>         Environment: Cent OS 5, IPA Dictionary, Run with "Search mdoe"
>            Reporter: Kazuaki Hiraga
>              Labels: features
>
> Japanese morphological analyzer, Kuromoji doesn't have a capability to 
> decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
> that some Katakana tokens can be decomposed, but it cannot be applied every 
> Katakana compound tokens. For instance, "トートバッグ(tote bag)" and "ショルダーバッグ" 
> don't decompose into "トート バッグ" and "ショルダー バッグ" although the IPA dictionary 
> has "バッグ" in its entry.  I would like to apply the decompose feature to every 
> Katakana tokens if the sub-tokens are in the dictionary or add the capability 
> to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to