[
https://issues.apache.org/jira/browse/LUCENE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845015#comment-16845015
]
Namgyu Kim commented on LUCENE-8784:
------------------------------------
Hi. [~jim.ferenczi] and [~Munkyu].
I uploaded a patch for this issue.
I only worked about Tokenizer and TokenizerFactory, and did not work about
Analyzer.
In the case of Japanese, it could not be customized. (discardPunctuation is
always true)
If necessary, we can easily add it to Analyzer.
However, I have a question now.
The current patch was developed in such a way that it continues to pass
parameters. (in _isPunctuation_ method)
If we don't use the static method, we don't have to pass the parameters every
time.
What do you think about disabling static in the _isPunctuation_ method?
> Nori(Korean) tokenizer removes the decimal point.
> ---------------------------------------------------
>
> Key: LUCENE-8784
> URL: https://issues.apache.org/jira/browse/LUCENE-8784
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Munkyu Im
> Priority: Major
> Attachments: LUCENE-8784.patch
>
>
> This is the same issue that I mentioned to
> [https://github.com/elastic/elasticsearch/issues/41401#event-2293189367]
> unlike standard analyzer, nori analyzer removes the decimal point.
> nori tokenizer removes "." character by default.
> In this case, it is difficult to index the keywords including the decimal
> point.
> It would be nice if there had the option whether add a decimal point or not.
> Like Japanese tokenizer does, Nori need an option to preserve decimal point.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]