[
https://issues.apache.org/jira/browse/LUCENE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620617#comment-16620617
]
Alan Woodward commented on LUCENE-8498:
---------------------------------------
Precommit has caught an interesting wrinkle here, in that CharTokenizer also
allows you to combine tokenization and normalization. As currently written, a
CharTokenizer combined with a normalizer will not have its normalization
applied when Analyzer.normalize() is called. Should we also remove the
normalization functions from CharTokenizer? cc [~thetaphi]
> Deprecate/Remove LowerCaseTokenizer
> -----------------------------------
>
> Key: LUCENE-8498
> URL: https://issues.apache.org/jira/browse/LUCENE-8498
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8498.patch
>
>
> LowerCaseTokenizer combines tokenization and filtering in a way that prevents
> us improving the normalization API. We should deprecate and remove it, as it
> can be replaced simply with a LetterTokenizer and LowerCaseFilter.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]