[
https://issues.apache.org/jira/browse/SOLR-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640615#action_12640615
]
Todd Feak commented on SOLR-814:
--------------------------------
Yes, they are used differently.
However, a word written in Hiragana is the *same* word when written in
Katakana. Same meaning. Futhermore, it's not always cut and dried which to use.
For example, a movie title may be written in Hiragana or Katakana, depending on
the Director's preference. The user (searcher) may not have remembered the
Director's preference, so may search using the other. Without this
normalization they would get a search miss.
I don't doubt your experience at Ultraseek, but this feature was explicitly
asked for by Japanese (native speaking) engineers at Sony. I *just* (literally)
double checked with a couple of onsite native speaking Japanese engineers and
both agree that this is useful, at least for our searches.
I would say that it should be up to the schema developer as to whether this
functionality is useful or not for their situation. Either way, I offer it up
to the community for their decision.
> Add new Japanese Hiragana Filter and Factory
> --------------------------------------------
>
> Key: SOLR-814
> URL: https://issues.apache.org/jira/browse/SOLR-814
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.3
> Reporter: Todd Feak
> Priority: Minor
> Attachments: SOLR-814.patch
>
>
> Japanese Hiragana and Katakana character sets can be easily translated
> between. This filter normalizes all Hiragana characters to their Katakana
> counterpart, allowing for indexing and searching using either.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.