[
https://issues.apache.org/jira/browse/LUCENE-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620376#comment-16620376
]
ASF subversion and git services commented on LUCENE-8352:
---------------------------------------------------------
Commit c696cafc0d7fc0e133df20e0b188655fa020fe99 in lucene-solr's branch
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c696caf ]
LUCENE-8352: Make TokenStreamComponents final
> Make TokenStreamComponents final
> --------------------------------
>
> Key: LUCENE-8352
> URL: https://issues.apache.org/jira/browse/LUCENE-8352
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Mark Harwood
> Priority: Minor
> Attachments: LUCENE-8352.patch, LUCENE-8352.patch
>
>
> The current design is a little trappy. Any specialised subclasses of
> TokenStreamComponents _(see_ _StandardAnalyzer, ClassicAnalyzer,
> UAX29URLEmailAnalyzer)_ are discarded by any subsequent Analyzers that wrap
> them _(see LimitTokenCountAnalyzer, QueryAutoStopWordAnalyzer,
> ShingleAnalyzerWrapper and other examples in elasticsearch)_.
> The current design means each AnalyzerWrapper.wrapComponents() implementation
> discards any custom TokenStreamComponents and replaces it with one of its own
> choosing (a vanilla TokenStreamComponents class from examples I've seen).
> This is a trap I fell into when writing a custom TokenStreamComponents with a
> custom setReader() and I wondered why it was not being triggered when wrapped
> by other analyzers.
> If AnalyzerWrapper is designed to encourage composition it's arguably a
> mistake to also permit custom TokenStreamComponent subclasses - the
> composition process does not preserve the choice of custom classes and any
> behaviours they might add. For this reason we should not encourage extensions
> to TokenStreamComponents (or if TSC extensions are required we should somehow
> mark an Analyzer as "unwrappable" to prevent lossy compositions).
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]