[
https://issues.apache.org/jira/browse/SOLR-13718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918519#comment-16918519
]
Ishan Chattopadhyaya commented on SOLR-13718:
---------------------------------------------
The above fix caused a test failure in TestLocalFSCloudBackupRestore. There is
something wrong with ShardRequestTracker (OCMH)'s processResponses(), whereby
the abortOnError is not respected in case of async requests. In this fix, I
tried aborting (on error) the async requests as well. However, due to
aforementioned wrong behaviour, the RestoreCmd was working around by adding
additional checks, and hence the test started failing after my fix.
Fixing this the right way will require handling these async responses across
all collection API commands uniformly, and will be a longer effort. For now,
I'm going to revert my fix and handle the SPLITSHARD failure the same way as
RestoreCmd is doing.
> SPLITSHARD using async can cause data loss
> ------------------------------------------
>
> Key: SOLR-13718
> URL: https://issues.apache.org/jira/browse/SOLR-13718
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 7.7.2, 8.1, 8.2
> Reporter: Ishan Chattopadhyaya
> Assignee: Ishan Chattopadhyaya
> Priority: Major
> Fix For: 7.7.3, 8.3
>
> Attachments: SOLR-13718.patch, solr-13718-reproduce.sh, solr.zip
>
>
> When using SPLITSHARD with async, if there are underlying failures in the
> SPLIT core command or other sub-commands of SPLITSHARD, then SPLITSHARD
> succeeds and results in two empty sub-shards.
> There are various potential failures with SPLIT core command, here's a way to
> reproduce using a Solr 6x index in Solr 7x.
> -Steps to reproduce (in Solr 7x):-
> {code}
> 1. Import the attached configset, and create a collection.
> 2. Move in the attached data directory (index created in Solr6x) in place of
> the created collection's data directory. Do a collection RELOAD.
> 3. Issue a *:* query, we see 5 documents.
> 4. Issue a SPLITSHARD (async), and then issue *:*, we see 0 documents.
> {code}
> Check attached solr-13718-reproduce.sh script to do the same (without needing
> the zip file).
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]