[
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomás Fernández Löbbe updated SOLR-445:
---------------------------------------
Attachment: SOLR-445-alternative.patch
My simple test to use with SolrCloud fails (not 100% of the times, but very
frequently). This is my understanding of the problem:
It works only in the case of the update arriving to the shard leader (as it
would fail while adding the doc locally), but if the update needs to be
forwarded to the leader, then it will not work.
If the request is forwarded to the leader it is done asynchronically and the
DistributedUpdateProcessor tracks the errors internally. Finally, after all the
docs where processed the “finish” method is called and the
DistributedUpdateProcessor will add one of the exceptions to the response. This
is a problem because “processAdd” never really fails as the
TolerantUpdateProcessor is expecting. It also can’t know the total number of
errors, this is counted internally in the DistributedUpdateProcessor.
As a side note, this DistributedUpdateProcessor behavior makes it “tolerant”,
but only in some cases? A request like this:
<add>invalid-doc</add>
<add>valid-doc</add>
<add>valid-doc</add>
would leave Solr in a different state depending on who is receiving the request
(the shard leader or a replica/follower). Is this expected?
> Update Handlers abort with bad documents
> ----------------------------------------
>
> Key: SOLR-445
> URL: https://issues.apache.org/jira/browse/SOLR-445
> Project: Solr
> Issue Type: Improvement
> Components: update
> Affects Versions: 1.3
> Reporter: Will Johnson
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch,
> SOLR-445-alternative.patch, SOLR-445-alternative.patch,
> SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
> SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid
> batch. Ie:
> <add>
> <doc>
> <field name="id">1</field>
> </doc>
> <doc>
> <field name="id">2</field>
> <field name="myDateField">I_AM_A_BAD_DATE</field>
> </doc>
> <doc>
> <field name="id">3</field>
> </doc>
> </add>
> Right now solr adds the first doc and then aborts. It would seem like it
> should either fail the entire batch or log a message/return a code and then
> continue on to add doc 3. Option 1 would seem to be much harder to
> accomplish and possibly require more memory while Option 2 would require more
> information to come back from the API. I'm about to dig into this but I
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]