[ 
https://issues.apache.org/jira/browse/LUCENE-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697719#comment-14697719
 ] 

Ramkumar Aiyengar commented on LUCENE-6737:
-------------------------------------------

ICU folding does this right? This patch is still useful even if so, in case you 
don't want to do the full folding, or don't want to use ICU, just curious 
really..

> Add DecimalDigitFilter
> ----------------------
>
>                 Key: LUCENE-6737
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6737
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>             Fix For: Trunk, 5.4
>
>         Attachments: LUCENE-6737.patch
>
>
> TokenFilter that folds all unicode digits 
> (http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Decimal_Number:])
>  to 0-9.
> Historically a lot of the impacted analyzers couldn't even tokenize numbers 
> at all, but now they use standardtokenizer for numbers/alphanum tokens. But 
> its usually the case you will find e.g. a mix of both ascii digits and 
> "native" digits, and today that makes searching difficult.
> Note this only impacts *decimal* digits, hence the name DecimalDigitFilter. 
> So no processing of chinese numerals or anything crazy like that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to