[jira] [Commented] (LUCENE-8498) Deprecate/Remove LowerCaseTokenizer

Alan Woodward (JIRA) Wed, 19 Sep 2018 06:54:17 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620617#comment-16620617
 ]


Alan Woodward commented on LUCENE-8498:
---------------------------------------

Precommit has caught an interesting wrinkle here, in that CharTokenizer also 
allows you to combine tokenization and normalization.  As currently written, a 
CharTokenizer combined with a normalizer will not have its normalization 
applied when Analyzer.normalize() is called.  Should we also remove the 
normalization functions from CharTokenizer?  cc [~thetaphi]

> Deprecate/Remove LowerCaseTokenizer
> -----------------------------------
>
>                 Key: LUCENE-8498
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8498
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8498.patch
>
>
> LowerCaseTokenizer combines tokenization and filtering in a way that prevents 
> us improving the normalization API.  We should deprecate and remove it, as it 
> can be replaced simply with a LetterTokenizer and LowerCaseFilter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8498) Deprecate/Remove LowerCaseTokenizer

Reply via email to