[ 
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8497:
----------------------------------
    Attachment: LUCENE-8497.patch

> Rethink multi-term analysis handling
> ------------------------------------
>
>                 Key: LUCENE-8497
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8497
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8497.patch, LUCENE-8497.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof 
> checks for MultiTermAwareComponent and casts.  MultiTermAwareComponent itself 
> deals in AbstractAnalysisComponents, and so callers need to cast to the 
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts.  One 
> possibility is to add `normalize` methods to CharFilterFactory and 
> TokenFilterFactory that mirror their existing `create` methods.  The default 
> implementation would return the input unchanged, while filters that should 
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which 
> combines tokenization and normalization in a way that will break this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to