[jira] [Reopened] (SOLR-10310) By default, stop splitting on whitespace prior to analysis in edismax and "Lucene"/standard query parsers

Steve Rowe (JIRA) Tue, 25 Apr 2017 14:05:23 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Rowe reopened SOLR-10310:
-------------------------------

My Jenkins found a reproducing seed for a CopyFieldTest failure, and {{git 
bisect}} says that the commit on this issue is to blame - note that it 
reproduces only if I remove the {{-Dtests.method=testCatchAllCopyField}} from 
the repro line:

{noformat}
Checking out Revision dd171ff8fe31df578b7e6fab1eb5bfc1ade3f5fc 
(refs/remotes/origin/master)
[...]
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=CopyFieldTest 
-Dtests.method=testCatchAllCopyField -Dtests.seed=27931CB10CE6100C 
-Dtests.slow=true -Dtests.locale=nl-BE -Dtests.timezone=Asia/Manila 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.05s J9  | CopyFieldTest.testCatchAllCopyField <<<
   [junit4]    > Throwable #1: java.lang.RuntimeException: Exception during 
query
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([27931CB10CE6100C:71EDDE8B88DB61D0]:0)
   [junit4]    >        at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:896)
   [junit4]    >        at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:863)
   [junit4]    >        at 
org.apache.solr.schema.CopyFieldTest.testCatchAllCopyField(CopyFieldTest.java:258)
   [junit4]    >        at java.lang.Thread.run(Thread.java:745)
   [junit4]    > Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//*[@numFound='1']
   [junit4]    >        xml response was: <?xml version="1.0" encoding="UTF-8"?>
   [junit4]    > <response>
   [junit4]    > <lst name="responseHeader"><int name="status">0</int><int 
name="QTime">0</int></lst><result name="response" numFound="2" 
start="0"><doc><int name="id">5</int><arr 
name="catchall_t"><str>5</str><str>10-1839ACX-93</str><str>AAM46</str><str>1565669397053308928</str></arr><arr
 name="sku1"><str>10-1839ACX-93</str></arr><arr 
name="1_s"><str>10-1839ACX-93</str></arr><arr 
name="1_dest_sub_s"><str>10-1839ACX-93</str></arr><arr 
name="dest_sub_no_ast_s"><str>10-1839ACX-93</str></arr><arr 
name="testing123_s"><str>AAM46</str></arr><long 
name="_version_">1565669397053308928</long><arr 
name="multiDefault"><str>muLti-Default</str></arr><int 
name="intDefault">42</int><date 
name="timestamp">2017-04-25T16:44:51.953Z</date></doc><doc><int 
name="id">10</int><arr name="catchall_t"><str>10</str><str>test copy 
field</str><str>this is a simple test of the copy field 
functionality</str><str>1565669397012414464</str></arr><arr 
name="title"><str>test copy field</str></arr><arr name="text_en"><str>this is a 
simple test of the copy field functionality</str></arr><arr 
name="highlight"><str>this is a simple test of </str><str>this is a simple test 
of </str></arr><long name="_version_">1565669397012414464</long><arr 
name="multiDefault"><str>muLti-Default</str></arr><int 
name="intDefault">42</int><date 
name="timestamp">2017-04-25T16:44:51.902Z</date></doc></result>
   [junit4]    > </response>
   [junit4]    >        request was:q=catchall_t:10-1839ACX-93&wt=xml
   [junit4]    >        at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:889)
   [junit4]    >        ... 41 more
[...]
   [junit4]   2> NOTE: test params are: 
codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST,
 chunkSize=13049, maxDocsPerChunk=9, blockSize=2), 
termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, 
chunkSize=13049, blockSize=2)), sim=RandomSimilarity(queryNorm=false): {}, 
locale=nl-BE, timezone=Asia/Manila
{noformat}

For some reason, when {{sow=false}}, a query that used to match only the doc 
indexed in the failing method now also matches a doc indexed in another method, 
which is never removed, so if by chance the other method runs before the 
failing method, then this failure happens.

I've got a patch that makes the other method use the same doc id for its 
indexed doc as the doc id used by the other methods, so that there's only ever 
one doc in the index at any given time.

> By default, stop splitting on whitespace prior to analysis in edismax and 
> "Lucene"/standard query parsers
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10310
>                 URL: https://issues.apache.org/jira/browse/SOLR-10310
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>             Fix For: master (7.0)
>
>         Attachments: SOLR-10310.patch
>
>
> SOLR-9185 introduced an option on the edismax and standard query parsers to 
> not perform pre-analysis whitespace splitting: the {{sow=false}} request 
> param.
> On master/7.0, we should make {{sow=false}} the default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Reopened] (SOLR-10310) By default, stop splitting on whitespace prior to analysis in edismax and "Lucene"/standard query parsers

Reply via email to