[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-445:
--------------------------
    Affects Version/s:     (was: 1.3)
        Fix Version/s: 6.1
                       master
          Description: 
This issue adds a new {{TolerantUpdateProcessorFactory}} making it possible to 
configure solr updates so that they are "tolerant" of individual errors in an 
update request...

{code}
  <processor class="solr.TolerantUpdateProcessorFactory">
    <int name="maxErrors">10</int>
  </processor>
{code}

When a chain with this processor is used, but maxErrors isn't exceeded, here's 
what the response looks like...

{code}
$ curl 
'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1'
 -H "Content-Type: application/json" --data-binary '{"add" : { 
"doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"1",
        "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For 
input string: \"bogus\""},
      {
        "type":"DELQ",
        "id":"malformed:[",
        "message":"org.apache.solr.search.SyntaxError: Cannot parse 
'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one 
of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
    "maxErrors":-1,
    "status":0,
    "QTime":1}}
{code}

Note in the above example that:

* maxErrors can be overridden on a per-request basis
* an effective {{maxErrors==-1}} (either from config, or request param) means 
"unlimited" (under the covers it's using {{Integer.MAX_VALUE}})

If/When maxErrors is reached for a request, then the _first_ exception that the 
processor caught is propagated back to the user, and metadata is set on that 
exception with all of the same details about all the tolerated errors.

This next example is the same as the previous except that instead of 
{{maxErrors=-1}} the request param is now {{maxErrors=1}}...

{code}
$ curl 
'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1'
 -H "Content-Type: application/json" --data-binary '{"add" : { 
"doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"1",
        "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For 
input string: \"bogus\""},
      {
        "type":"DELQ",
        "id":"malformed:[",
        "message":"org.apache.solr.search.SyntaxError: Cannot parse 
'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one 
of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
    "maxErrors":1,
    "status":400,
    "QTime":1},
  "error":{
    "metadata":[
      "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] 
Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
      
"org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError:
 Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas 
expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    ",
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input 
string: \"bogus\"",
    "code":400}}
{code}

...the added exception metadata ensures that even in client code like the 
various SolrJ SolrClient implementations, which throw a (client side) exception 
on non-200 responses, the end user can access info on all the tolerated errors 
that were ignored before the maxErrors threshold was reached.


----

{panel:title=Original Jira Request}
Has anyone run into the problem of handling bad documents / failures mid batch. 
 Ie:

<add>
  <doc>
    <field name="id">1</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="myDateField">I_AM_A_BAD_DATE</field>
  </doc>
  <doc>
    <field name="id">3</field>
  </doc>
</add>

Right now solr adds the first doc and then aborts.  It would seem like it 
should either fail the entire batch or log a message/return a code and then 
continue on to add doc 3.  Option 1 would seem to be much harder to accomplish 
and possibly require more memory while Option 2 would require more information 
to come back from the API.  I'm about to dig into this but I thought I'd ask to 
see if anyone had any suggestions, thoughts or comments.    
{panel}


  was:
Has anyone run into the problem of handling bad documents / failures mid batch. 
 Ie:

<add>
  <doc>
    <field name="id">1</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="myDateField">I_AM_A_BAD_DATE</field>
  </doc>
  <doc>
    <field name="id">3</field>
  </doc>
</add>

Right now solr adds the first doc and then aborts.  It would seem like it 
should either fail the entire batch or log a message/return a code and then 
continue on to add doc 3.  Option 1 would seem to be much harder to accomplish 
and possibly require more memory while Option 2 would require more information 
to come back from the API.  I'm about to dig into this but I thought I'd ask to 
see if anyone had any suggestions, thoughts or comments.    




updated summary to reflect basic information about feature being added

> Update Handlers abort with bad documents
> ----------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Will Johnson
>            Assignee: Hoss Man
>             Fix For: master, 6.1
>
>         Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, 
> SOLR-445-alternative.patch, SOLR-445-alternative.patch, 
> SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, 
> SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, 
> SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> This issue adds a new {{TolerantUpdateProcessorFactory}} making it possible 
> to configure solr updates so that they are "tolerant" of individual errors in 
> an update request...
> {code}
>   <processor class="solr.TolerantUpdateProcessorFactory">
>     <int name="maxErrors">10</int>
>   </processor>
> {code}
> When a chain with this processor is used, but maxErrors isn't exceeded, 
> here's what the response looks like...
> {code}
> $ curl 
> 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1'
>  -H "Content-Type: application/json" --data-binary '{"add" : { 
> "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
> {
>   "responseHeader":{
>     "errors":[{
>         "type":"ADD",
>         "id":"1",
>         "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For 
> input string: \"bogus\""},
>       {
>         "type":"DELQ",
>         "id":"malformed:[",
>         "message":"org.apache.solr.search.SyntaxError: Cannot parse 
> 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one 
> of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
>     "maxErrors":-1,
>     "status":0,
>     "QTime":1}}
> {code}
> Note in the above example that:
> * maxErrors can be overridden on a per-request basis
> * an effective {{maxErrors==-1}} (either from config, or request param) means 
> "unlimited" (under the covers it's using {{Integer.MAX_VALUE}})
> If/When maxErrors is reached for a request, then the _first_ exception that 
> the processor caught is propagated back to the user, and metadata is set on 
> that exception with all of the same details about all the tolerated errors.
> This next example is the same as the previous except that instead of 
> {{maxErrors=-1}} the request param is now {{maxErrors=1}}...
> {code}
> $ curl 
> 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1'
>  -H "Content-Type: application/json" --data-binary '{"add" : { 
> "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
> {
>   "responseHeader":{
>     "errors":[{
>         "type":"ADD",
>         "id":"1",
>         "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For 
> input string: \"bogus\""},
>       {
>         "type":"DELQ",
>         "id":"malformed:[",
>         "message":"org.apache.solr.search.SyntaxError: Cannot parse 
> 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one 
> of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
>     "maxErrors":1,
>     "status":400,
>     "QTime":1},
>   "error":{
>     "metadata":[
>       "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] 
> Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
>       
> "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError:
>  Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas 
> expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    ",
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","java.lang.NumberFormatException"],
>     "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input 
> string: \"bogus\"",
>     "code":400}}
> {code}
> ...the added exception metadata ensures that even in client code like the 
> various SolrJ SolrClient implementations, which throw a (client side) 
> exception on non-200 responses, the end user can access info on all the 
> tolerated errors that were ignored before the maxErrors threshold was reached.
> ----
> {panel:title=Original Jira Request}
> Has anyone run into the problem of handling bad documents / failures mid 
> batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it 
> should either fail the entire batch or log a message/return a code and then 
> continue on to add doc 3.  Option 1 would seem to be much harder to 
> accomplish and possibly require more memory while Option 2 would require more 
> information to come back from the API.  I'm about to dig into this but I 
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
>  
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to