[
https://issues.apache.org/jira/browse/SOLR-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291635#comment-13291635
]
Christian Moen commented on SOLR-3524:
--------------------------------------
Hiraga-san, there are different views on how punctuation characters best are
handled by tokenizers. Punctuation characters generally don't convey much
meaning useful for text search, so they are generally removed in Lucene. (A
different point of view is that tokenizers shouldn't remove punctuations and
that filters should do this.)
The ability to keep punctuation was left as an expert-feature in
JapanseTokenizer and I think we can expose this as an expert feature in Solr as
well. Could you share some details on your use-case just so that I get a
better idea of the background and importance of this?
> Make discard-punctuation feature in Kuromoji configurable from
> JapaneseTokenizerFactory
> ---------------------------------------------------------------------------------------
>
> Key: SOLR-3524
> URL: https://issues.apache.org/jira/browse/SOLR-3524
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Affects Versions: 3.6
> Reporter: Kazuaki Hiraga
> Priority: Minor
> Attachments: kuromoji_discard_punctuation.patch.txt
>
>
> JapaneseTokenizer, Kuromoji doesn't provide configuration option to preserve
> punctuation in Japanese text, although It has a parameter to change this
> behavior. JapaneseTokenizerFactory always set third parameter, which
> controls this behavior, to true to remove punctuation.
> I would like to have an option I can configure this behavior by fieldtype
> definition in schema.xml.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]