[
https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640618#action_12640618
]
Todd Feak commented on SOLR-815:
--------------------------------
It's a hidden storage format in the index. As long as index and search do it
the same way, it's a coin toss.
For this particular case, Full-Width was chosen as the underlying format, as
the majority of Japanese text and searches that we are seeing actually uses the
Full-Width versions of both the Katakana and Latin characters. This is probably
due to the platform we are on. This means less conversion occurs. Admittedly,
it's a minor performance choice, but this is what we have.
I'm not stuck on it being one way or the other and change should be easy.
> Add new Japanese half-width/full-width normalizaton Filter and Factory
> ----------------------------------------------------------------------
>
> Key: SOLR-815
> URL: https://issues.apache.org/jira/browse/SOLR-815
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.3
> Reporter: Todd Feak
> Priority: Minor
> Attachments: SOLR-815.patch
>
>
> Japanese Katakana and Latin alphabet characters exist as both a "half-width"
> and "full-width" version. This new Filter normalizes to the full-width
> version to allow searching and indexing using both.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.