[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated SOLR-4619: ----------------------------- Attachment: SOLR-4619.patch Patch that brings Andrzej's patch up to date with trunk, and adds tests for query-time functionality. I had assumed that {{PreAnalyzedField}}-s would use the {{PreAnalyzedTokenizer}} at query time, but that is not (currently) the case: instead {{FieldType.DefaultAnalyzer}} is used. This patch changes the behavior when no analyzer is specified to instead use {{PreAnalyzedTokenizer}}. However, there is a chicken-and-egg interaction between {{PreAnalyzedTokenizer}} and {{QueryBuilder.createFieldQuery()}}, which aborts before performing any tokenization if the supplied analyzer's attribute factory doesn't contain a {{TermToBytesRefAttribute}}. But {{PreAnalyzedTokenizer}} doesn't have any attributes defined until the input stream is consumed, in {{reset()}}. [~rcmuir] added a comment as part of LUCENE-5388 to {{PreAnalyzedTokenizer}}'s ctor, where {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}} is set as the attribute factory rather than the default packed implementation: "we don't pack attributes: since we are used for (de)serialization and dont want bloat." This patch moves the {{stream.reset()}} call in {{QueryBuilder.createFieldQuery()}} in front of the {{TermToBytesRefAttribute}} check, so that {{PreAnalyzedTokenizer}} (and other tokenizers that don't have a pre-added set of attributes) and also moves the {{addAttribute(PositionIncrementAttribute.class)}} call to after the the {{TermToBytesRefAttribute}} check. An alternate approach to fix the chicken-and-egg problem might be to have {{PreAnalyzedTokenizer}} always include a dummy {{TermToBytesRefAttribute}} implementation, and then remove it when {{reset()}} is called, but that seems hackish. I haven't run the full tests yet with this patch, but the included query-time {{PreAnalyzedField}} tests success. I welcome feedback. > Improve PreAnalyzedField query analysis > --------------------------------------- > > Key: SOLR-4619 > URL: https://issues.apache.org/jira/browse/SOLR-4619 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis > Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Fix For: Trunk > > Attachments: SOLR-4619.patch, SOLR-4619.patch > > > PreAnalyzed field extends plain FieldType and mistakenly uses the > DefaultAnalyzer as query analyzer, and doesn't allow for customization via > <analyzer> schema elements. > Instead it should extend TextField and support all query analysis supported > by that type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org