[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents

Hoss Man (JIRA) Fri, 18 Apr 2014 17:47:26 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974667#comment-13974667
 ]


Hoss Man commented on SOLR-445:
-------------------------------

bq. I think this would make it more confusing. Having this processor means that 
the client wants to manage failing docs on their side. If all the docs fail so 
be it.

Yeah, i'm not convinced you're wrong -- I just wasn't sure how i felt about it 
and I wanted to make we considered.  Even if users configure this, they might 
be surprised if something like a a schema.xml mismatch with some update process 
they are using causes a 500 error on every individual udpate -- but still 
results in a 200 coming back because of this component.

But I think you are right  -- as long as the docs are clear that the status 
will _allways_ be a 200, even if all docs fail, we're fine.

bq. I was also thinking that this processor won’t work together with 
DistributedUpdateProcessor, it has its own error processing, plus the 
distribution would create multiple internal requests...

As long as this processor is configured before the 
DistributedUpdateProcessorFactory it should work fine:
* when the requests get forwarded to other shards, they'll bypass this 
processor (and any other processors that come before 
DistributedUpdateProcessorFactory) so it won't break the cumulative error 
handling in DistributedUpdateProcessorFactory
* DistributedUpdateProcessorFactory still ultimately throws only one Exception 
per UpdateCommand when it forwards to multiple replicas, so your new processor 
will still get at most 1 error to track per doc when accumulating results to 
return to the client

but it's trivial to write a distributed version of your test case to prove that 
you get the results you expect -- probably a good idea to write one to help 
future proof this processor against unforeseen future changes in the 
distributed update processing

> Update Handlers abort with bad documents
> ----------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, 
> SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445.patch, 
> SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, 
> solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid 
> batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it 
> should either fail the entire batch or log a message/return a code and then 
> continue on to add doc 3.  Option 1 would seem to be much harder to 
> accomplish and possibly require more memory while Option 2 would require more 
> information to come back from the API.  I'm about to dig into this but I 
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents

Reply via email to