[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated SOLR-4619: ----------------------------- Attachment: SOLR-4619.patch {quote} bq. A new analyzer class employing PreAnalyzedTokenizer could override initReader() or setReader(). I'll try with setReader(), since the docs for initReader() are focused on reader conditioning via char filters. I was referring to TokenStreamComponents.setReader() here, which is called as part of Analyzer.tokenStream(): A subclass created in the new analyzer's overridden createComponents() could call a new method on PreAnalyzedTokenizer to consume the input reader and in so doing provide the attributes. {quote} Patch implementing the idea, splitting reader consumption out from reset() into its own method: decodeInput(). This method first removes all attributes from PreAnalyzedTokenizer's AttributeSource, then adds needed ones as a side effect of parsing the input. There is a kludge here: because TokenStreamComponents.setReader() doesn't throw an exception, PreAnalyzedAnalyzer overrides createComponents() to create a TokenStreamComponents instance that catches and stores exceptions encountered during reader consumption with the stream's PreAnalyzedTokenizer instance, whose reset() method will then throw the stored exception, if any. With this patch, PreAnalyzedAnalyzer can be reused; previously PreAnalyzedTokenizer reuse would ignore new input and re-emit tokens deserialized from the initial input. With this patch, PreAnalyzedField analysis works like this: # If a query analyzer is specified in the schema then it will be used at query time. # If an analyzer is specified in the schema with no type (i.e., it is neither of "index" nor "query" type), then this analyzer will be used for query parsing, but will be ignored at index time. # If only an analyzer of "index" type is specified in the schema, then this analyzer will be used for query parsing, but will be ignored at index time. This patch adds a new method removeAllAttributes() to AttributeSource, to support reuse of token streams with variable attributes, like PreAnalyzedTokenizer. I think it's ready to go. > Improve PreAnalyzedField query analysis > --------------------------------------- > > Key: SOLR-4619 > URL: https://issues.apache.org/jira/browse/SOLR-4619 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Fix For: Trunk > > Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch > > > PreAnalyzed field extends plain FieldType and mistakenly uses the > DefaultAnalyzer as query analyzer, and doesn't allow for customization via > <analyzer> schema elements. > Instead it should extend TextField and support all query analysis supported > by that type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org