[
https://issues.apache.org/jira/browse/LUCENE-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904548#comment-16904548
]
Tomoko Uchida commented on LUCENE-8948:
---------------------------------------
I've searched a bit of details of the parameter naming.
The factories' "name" parameter should come from ICU4J Normalizer2 factory
class method parameter.
[http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/Normalizer2.html#getInstance-java.io.InputStream-java.lang.String-com.ibm.icu.text.Normalizer2.Mode-]
{quote}data - the binary, big-endian normalization (.nrm file) data, or null
for ICU data
name - "nfc" or "nfkc" or "nfkc_cf" or name of custom data file
{quote}
Strictly speaking, the ICU4J normalizer's "name" seems not to be equal to the
"Unicode normalization form" (it has wider meaning than "normalization form").
Nonetheless "data" is always null when Lucene ICU factories instantiate it so
it looks okay to me to change the parameter to "form" from the standpoint of
understandability.
Just in case, [~thetaphi]: does that make sense to you?
> Change "name" argument in ICU factories to "form"
> -------------------------------------------------
>
> Key: LUCENE-8948
> URL: https://issues.apache.org/jira/browse/LUCENE-8948
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Tomoko Uchida
> Priority: Minor
>
> {{o.a.l.a.icu.ICUNormalizer2CharFilterFactory}} and
> {{o.a.l.a.icu.ICUNormalizer2FilterFactory}} have "name" arguments to specify
> Unicode Normalization Form. The "name" is vague and it causes problem with
> SOLR-13593.
> "form" would be suitable here instead of "name".
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]