[jira] [Comment Edited] (LUCENE-5470) Refactoring multiterm analysis

Tim Allison (JIRA) Tue, 25 Feb 2014 11:29:27 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911900#comment-13911900
 ]


Tim Allison edited comment on LUCENE-5470 at 2/25/14 7:27 PM:
--------------------------------------------------------------

{quote}Can we just analyze multiterm queries without trying to parse around 
wildcards or what not? This is basically what solr is doing today. I think 
trying to interpret the syntax is a bit too funky and error-prone, and its 
better if someone wants "magic" to have that in their QP itself.{quote}

I'm of two minds on this.  From the Solr perspective, absolutely, getMultiterm 
is sufficient.  From the Lucene perspective, users may want to use the 
off-the-shelf analyzers like StandardAnalyzer and be puzzled that they don't 
work for multiterms...AnalyzingQueryParser fits this need for wildcard queries 
(not for regex, though).  Some thoughts on this:

1) Do the least harm option: consolidate getMultitermTerm as a public static 
method in QueryParserBase and let AnalyzingQueryParser do its wildcard stuff as 
is.

2) Do the above, but also add an AnalyzingQueryParserBase layer that does the 
wildcard trickery (and maybe add something for regex)?  Classic QueryParser and 
others could then subclass AnalyzingQP.  The benefit of this is that we could 
get rid of AnalyzingQP and add multiterm analysis to other parsers that 
currently subclass QPBase.  This would only benefit people working at the 
Lucene level.

3) A more drastic step would be to move the Solr MultitermAware processing in 
FieldTypePluginLoader down into the Lucene layer...but this wouldn't solve the 
problem of Lucene users misusing off the shelf Analyzers.


was (Author: [email protected]):
{quote}

> Refactoring multiterm analysis
> ------------------------------
>
>                 Key: LUCENE-5470
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5470
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 5.0
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: LUCENE-5470.patch
>
>
> There are currently three methods to analyze multiterms in Lucene and Solr:
> 1) QueryParserBase
> 2) AnalyzingQueryParser
> 3) TextField (Solr)
> The code in QueryParserBase and in TextField do not consume the tokenstream 
> if more than one token is generated by the analyzer.  (Admittedly, thanks to 
> the magic of MultitermAwareComponents in Solr, this type of exception 
> probably never happens and the unconsumed stream problem is probably 
> non-existent in Solr.)
> I propose consolidating the multiterm analysis code into one place: 
> QueryBuilder in Lucene core.
> This is part of a refactoring that will also help reduce duplication of code 
> with LUCENE-5205.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5470) Refactoring multiterm analysis

Reply via email to