[
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Rowe updated LUCENE-2605:
-------------------------------
Attachment: LUCENE-2605.patch
Patch fixing up two problems:
# Multiple whitespace-separated terms' TermQuery-s within a BooleanClause are
now flattened directly into the output Query list, rather than inserting the
BooleanClause.
# MultiFieldQuery's getFieldQuery() is modified to recombine multiple terms
from each field's query, to produce a series of disjunctions of term against
each field.
All queryparser module tests now pass, with the exception of the flexible query
parser's TestStandardQP run with QueryParserTestBase.testQPA(). Since this
patch doesn't modify anything about the flexible query parser, this is not
surprising.
> queryparser parses on whitespace
> --------------------------------
>
> Key: LUCENE-2605
> URL: https://issues.apache.org/jira/browse/LUCENE-2605
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/queryparser
> Reporter: Robert Muir
> Assignee: Steve Rowe
> Fix For: 4.9, 6.0
>
> Attachments: LUCENE-2605.patch, LUCENE-2605.patch
>
>
> The queryparser parses input on whitespace, and sends each whitespace
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across
> whitespace boundaries:
> * n-gram analysis
> * shingles
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their
> charfilters/tokenizers/tokenfilters will do the same thing at index and
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse
> around only real 'operators'.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]