[
https://issues.apache.org/jira/browse/SOLR-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930064#comment-15930064
]
Shawn Heisey commented on SOLR-9185:
------------------------------------
bq. For edismax, this is a departure: it's supposed to never throw exceptions.
I'm not sure this is completely accurate. The original dismax parser almost
never throws exceptions, mostly because it doesn't handle standard syntax for
specifying fields, operators, etc. Because edismax does allow most of that
syntax, I think exceptions are expected.
> Solr's edismax and "Lucene"/standard query parsers should optionally not
> split on whitespace before sending terms to analysis
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-9185
> URL: https://issues.apache.org/jira/browse/SOLR-9185
> Project: Solr
> Issue Type: New Feature
> Reporter: Steve Rowe
> Assignee: Steve Rowe
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-9185.patch, SOLR-9185.patch, SOLR-9185.patch,
> SOLR-9185.patch
>
>
> Copied from LUCENE-2605:
> The queryparser parses input on whitespace, and sends each whitespace
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across
> whitespace boundaries:
> n-gram analysis
> shingles
> synonyms (especially multi-word for whitespace-separated languages)
> languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their
> charfilters/tokenizers/tokenfilters will do the same thing at index and
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse
> around only real 'operators'.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]