[jira] [Updated] (LUCENE-2605) queryparser parses on whitespace

Steve Rowe (JIRA) Thu, 12 May 2016 21:19:34 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Rowe updated LUCENE-2605:
-------------------------------
    Attachment: LUCENE-2605.patch

Patch fixing up two problems: 

# Multiple whitespace-separated terms' TermQuery-s within a BooleanClause are 
now flattened directly into the output Query list, rather than inserting the 
BooleanClause.
# MultiFieldQuery's getFieldQuery() is modified to recombine multiple terms 
from each field's query, to produce a series of disjunctions of  term against 
each field.

All queryparser module tests now pass, with the exception of the flexible query 
parser's TestStandardQP run with QueryParserTestBase.testQPA().  Since this 
patch doesn't modify anything about the flexible query parser, this is not 
surprising.

> queryparser parses on whitespace
> --------------------------------
>
>                 Key: LUCENE-2605
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2605
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>            Reporter: Robert Muir
>            Assignee: Steve Rowe
>             Fix For: 4.9, 6.0
>
>         Attachments: LUCENE-2605.patch, LUCENE-2605.patch
>
>
> The queryparser parses input on whitespace, and sends each whitespace 
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across 
> whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their 
> charfilters/tokenizers/tokenfilters will do the same thing at index and 
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse 
> around only real 'operators'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-2605) queryparser parses on whitespace

Reply via email to