[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

Steve Rowe (JIRA) Tue, 19 Jan 2016 17:21:56 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Rowe updated SOLR-4619:
-----------------------------
    Attachment: SOLR-4619.patch

{quote}
bq. A new analyzer class employing PreAnalyzedTokenizer could override 
initReader() or setReader(). I'll try with setReader(), since the docs for 
initReader() are focused on reader conditioning via char filters.

I was referring to TokenStreamComponents.setReader() here, which is called as 
part of Analyzer.tokenStream(): A subclass created in the new analyzer's 
overridden createComponents() could call a new method on PreAnalyzedTokenizer 
to consume the input reader and in so doing provide the attributes.
{quote}

Patch implementing the idea, splitting reader consumption out from reset() into 
its own method: decodeInput().  This method first removes all attributes from 
PreAnalyzedTokenizer's AttributeSource, then adds needed ones as a side effect 
of parsing the input.

There is a kludge here: because TokenStreamComponents.setReader() doesn't throw 
an exception, PreAnalyzedAnalyzer overrides createComponents() to create a 
TokenStreamComponents instance that catches and stores exceptions encountered 
during reader consumption with the stream's PreAnalyzedTokenizer instance, 
whose reset() method will then throw the stored exception, if any.

With this patch, PreAnalyzedAnalyzer can be reused; previously 
PreAnalyzedTokenizer reuse would ignore new input and re-emit tokens 
deserialized from the initial input.

With this patch, PreAnalyzedField analysis works like this: 
# If a query analyzer is specified in the schema then it will be used at query 
time.
# If an analyzer is specified in the schema with no type (i.e., it is neither 
of "index" nor "query" type), then this analyzer will be used for query 
parsing, but will be ignored at index time.
# If only an analyzer of "index" type is specified in the schema, then this 
analyzer will be used for query parsing, but will be ignored at index time.

This patch adds a new method removeAllAttributes() to AttributeSource, to 
support reuse of token streams with variable attributes, like 
PreAnalyzedTokenizer.

I think it's ready to go.

> Improve PreAnalyzedField query analysis
> ---------------------------------------
>
>                 Key: SOLR-4619
>                 URL: https://issues.apache.org/jira/browse/SOLR-4619
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: Trunk
>
>         Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch
>
>
> PreAnalyzed field extends plain FieldType and mistakenly uses the 
> DefaultAnalyzer as query analyzer, and doesn't allow for customization via 
> <analyzer> schema elements.
> Instead it should extend TextField and support all query analysis supported 
> by that type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

Reply via email to