[
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291836#comment-13291836
]
Kazuaki Hiraga commented on SOLR-3524:
--------------------------------------
Thank you guys!
Christian, Since some documents have keywords that consists of alphabet and
punctuation such as c++, c# and so on, We want to match those keywords with the
keyword that unchanged form. Of course, we will discard punctuation in many
cases but some cases, especially short text, we want to preserve punctuation.
Therefore, I want to have an option that I can control this behaviour.
Ohtani-san, thank you for your early reply and patch!
> Make discard-punctuation feature in Kuromoji configurable from
> JapaneseTokenizerFactory
> ---------------------------------------------------------------------------------------
>
> Key: SOLR-3524
> URL: https://issues.apache.org/jira/browse/SOLR-3524
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Affects Versions: 3.6
> Reporter: Kazuaki Hiraga
> Priority: Minor
> Attachments: SOLR-3524.patch, kuromoji_discard_punctuation.patch.txt
>
>
> JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve
> punctuation in Japanese text, although It has a parameter to change this
> behavior. JapaneseTokenizerFactory always set third parameter, which
> controls this behavior, to true to remove punctuation.
> I would like to have an option I can configure this behavior by fieldtype
> definition in schema.xml.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]