[
https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Woodward reassigned LUCENE-8497:
-------------------------------------
Assignee: Alan Woodward
> Rethink multi-term analysis handling
> ------------------------------------
>
> Key: LUCENE-8497
> URL: https://issues.apache.org/jira/browse/LUCENE-8497
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8497.patch, LUCENE-8497.patch, LUCENE-8497.patch,
> LUCENE-8497.patch
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The current framework for handling term normalisation works via instanceof
> checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself
> deals in AbstractAnalysisComponents, and so callers need to cast to the
> correct component type before use, which is ripe for misuse.
> We should re-organise all this to be type-safe and usable without casts. One
> possibility is to add `normalize` methods to CharFilterFactory and
> TokenFilterFactory that mirror their existing `create` methods. The default
> implementation would return the input unchanged, while filters that should
> apply at normalization time can delegate to `create`.
> Related to this, we should deprecate and remove LowerCaseTokenizer, which
> combines tokenization and normalization in a way that will break this API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]