[ https://issues.apache.org/jira/browse/LUCENE-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693360#comment-16693360 ]
ASF subversion and git services commented on LUCENE-8497: --------------------------------------------------------- Commit 65486442c4a893a17cd70c9a865fa1af7c160aa3 in lucene-solr's branch refs/heads/jira/http2 from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6548644 ] LUCENE-8497: Replace MultiTermAwareComponent with normalize() method > Rethink multi-term analysis handling > ------------------------------------ > > Key: LUCENE-8497 > URL: https://issues.apache.org/jira/browse/LUCENE-8497 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Fix For: master (8.0) > > Attachments: LUCENE-8497.patch, LUCENE-8497.patch, LUCENE-8497.patch, > LUCENE-8497.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The current framework for handling term normalisation works via instanceof > checks for MultiTermAwareComponent and casts. MultiTermAwareComponent itself > deals in AbstractAnalysisComponents, and so callers need to cast to the > correct component type before use, which is ripe for misuse. > We should re-organise all this to be type-safe and usable without casts. One > possibility is to add `normalize` methods to CharFilterFactory and > TokenFilterFactory that mirror their existing `create` methods. The default > implementation would return the input unchanged, while filters that should > apply at normalization time can delegate to `create`. > Related to this, we should deprecate and remove LowerCaseTokenizer, which > combines tokenization and normalization in a way that will break this API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org