[
https://issues.apache.org/jira/browse/LUCENE-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852014#comment-16852014
]
Namgyu Kim commented on LUCENE-8812:
------------------------------------
Thank you for your reply, [~jim.ferenczi] :D
{quote}I wonder if it would be difficult to have a base class for the Japanese
and Korean number filter since they share a large amount of code. However I
think it's ok to merge this first and we can tackle the merge in a follow up,
wdyt ?
{quote}
I think it is an awesome refactoring.
If the refactoring is done, we can also share this TokenFilter in
SmartChineseAnalyzer. (Chinese and Japanese use the same numeric characters)
The amount of code will also be reduced.
I think the NumberFilter (new abstract class) can be in the
org.apache.lucene.analysis.core(analysis-common) or
org.apache.lucene.analysis(lucene-core) package, what do you think?
In my personal opinion, analysis-common seems to be correct, but it is a
little bit ambiguous.
> add KoreanNumberFilter to Nori(Korean) Analyzer
> -----------------------------------------------
>
> Key: LUCENE-8812
> URL: https://issues.apache.org/jira/browse/LUCENE-8812
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Namgyu Kim
> Priority: Major
> Attachments: LUCENE-8812.patch
>
>
> This is a follow-up issue to LUCENE-8784.
> The KoreanNumberFilter is a TokenFilter that normalizes Korean numbers to
> regular Arabic decimal numbers in half-width characters.
> Logic is similar to JapaneseNumberFilter.
> It should be able to cover the following test cases.
> 1) Korean Word to Number
> 십만이천오백 => 102500
> 2) 1 character conversion
> 일영영영 => 1000
> 3) Decimal Point Calculation
> 3.2천 => 3200
> 4) Comma between three digits
> 4,647.0010 => 4647.001
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]