[
https://issues.apache.org/jira/browse/LUCENE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237674#comment-13237674
]
Christian Moen commented on LUCENE-3915:
----------------------------------------
Find attached a draft patch that replaces term attributes with readings. I saw
in Ohtani-san's Twitter feed that Koji had checked this functionality into
lucene-gosen and I'm providing a similar patch here hoping to support the
Japanese spell-checking work.
This patch can also convert katakana readings to romaji and it might make sense
to use a romaji representation to do the spell-checking. We probably also need
to deal with misspellings turning into several tokens, and that we need to
recompose them using their readings before we do matching.
Just some thoughts...
> Add Japanese filter to replace term attribute with readings
> -----------------------------------------------------------
>
> Key: LUCENE-3915
> URL: https://issues.apache.org/jira/browse/LUCENE-3915
> Project: Lucene - Java
> Issue Type: New Feature
> Reporter: Christian Moen
> Priority: Minor
> Attachments: LUCENE-3915.patch
>
>
> Koji and Robert are working on LUCENE-3888 that allows spell-checkers to do
> their similarity matching using a different word than its surface form.
> This approach is very useful for languages such as Japanese where the surface
> form and the form we'd like to use for similarity matching is very different.
> For Japanese, it's useful to use readings for this -- probably with some
> normalization.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]