[jira] [Commented] (SOLR-9185) Solr's edismax and "Lucene"/standard query parsers should optionally not split on whitespace before sending terms to analysis

Shawn Heisey (JIRA) Fri, 17 Mar 2017 07:46:10 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930064#comment-15930064
 ]


Shawn Heisey commented on SOLR-9185:
------------------------------------

bq. For edismax, this is a departure: it's supposed to never throw exceptions.

I'm not sure this is completely accurate.  The original dismax parser almost 
never throws exceptions, mostly because it doesn't handle standard syntax for 
specifying fields, operators, etc.  Because edismax does allow most of that 
syntax, I think exceptions are expected.


> Solr's edismax and "Lucene"/standard query parsers should optionally not 
> split on whitespace before sending terms to analysis
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9185
>                 URL: https://issues.apache.org/jira/browse/SOLR-9185
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>             Fix For: 6.5, master (7.0)
>
>         Attachments: SOLR-9185.patch, SOLR-9185.patch, SOLR-9185.patch, 
> SOLR-9185.patch
>
>
> Copied from LUCENE-2605:
> The queryparser parses input on whitespace, and sends each whitespace 
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across 
> whitespace boundaries:
> n-gram analysis
> shingles
> synonyms (especially multi-word for whitespace-separated languages)
> languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their 
> charfilters/tokenizers/tokenfilters will do the same thing at index and 
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse 
> around only real 'operators'.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9185) Solr's edismax and "Lucene"/standard query parsers should optionally not split on whitespace before sending terms to analysis

Reply via email to