[ https://issues.apache.org/jira/browse/LUCENE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Sokolov resolved LUCENE-8240. ------------------------------------- Resolution: Won't Fix Analyzer.TokenStreamComponents actually became private instead (in Lucene 8)! However TokenStreamComponents now accepts a callback (functional object) that enables the desired usage, so we can close this issue. > Make TokenStreamComponents.setReader public > ------------------------------------------- > > Key: LUCENE-8240 > URL: https://issues.apache.org/jira/browse/LUCENE-8240 > Project: Lucene - Core > Issue Type: Wish > Components: modules/analysis > Reporter: Michael Sokolov > Priority: Major > Attachments: SubFieldAnalyzer.java > > > The simplest change for this would be to make > TokenStreamComponents.setReader() public. Another alternative would be to > provide a SubFieldAnalyzer along the lines of what is attached, although for > reasons given below I think this implementation is a little hacky and would > ideally be supported in a different way before making *that* part of a public > Lucene API. > Exposing this method would allow a third-party extension to access it in > order to wrap TokenStreamComponents. My use case is a SubFieldAnalyzer > (attached, for reference) that applies different analysis to different > instances of a field. This supports a big "catch-all" field that has > different (index-time) text processing. The way we implement that is by > creating a TokenStreamComponents that wraps separate per-subfield components > and switches among them when setReader() is called. > Why setReader()? This is the only part of the API where we can inject this > notion of subfields. setReader() is called with a Reader for each field > instance, and we supply a special Reader that identifies its subfield. > This is a bit hacky – ideally subfields would be first-class citizens in the > Analyzer API, so eg there would be methods like > Analyzer.createComponents(String fieldName, String subFieldName), etc. > However this seems like a pretty big change for an experimental feature, so > it seems like an OK tradeoff to live with the Reader-per-subfield hack for > now. > Currently SubFieldAnalyzer has to live in org.apache.lucene.analysis package > in order to call TokenStreamComponents.setReader (on a separate instance) and > propitiate java's code-hiding rules, which is awkward. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org