[ 
https://issues.apache.org/jira/browse/SOLR-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708939#comment-16708939
 ] 

Jason Gerlowski commented on SOLR-9492:
---------------------------------------

I'd like to rectify this if it's still an issue, but haven't been able to 
reproduce the issue yet.  I can get SPLITSHARD to fail in a few different ways, 
but none produce the "status=completed" that Shalin mentions in his example 
above.  It's possible that Steve's fix on SOLR-5970 fixed this issue for us.

Assuming the problem still exists and I'm just not creative enough to reproduce 
it successfully, I've got a pretty good guess where the problem lies.  The 
overseer's {{OverseerCollectionMessageHandler}} has a {{processResponses}} 
method which is invoked several times to check for errors while executing 
subtasks within a SPLITSHARD request.  The SPLITSHARD code tells 
{{processResponses}} to abort on error (by throwing a SolrException), but the 
logic that does this only checks for an "exception" field in the response 
namedList.  This is sufficient for a lot of error cases, but Solr's APIs don't 
consistently return "exceptions" fields on all error cases.  If one of these 
responses is returned, we'll log the error under the "failure" map, but never 
abort the splitshard request.

> Request status API returns a completed status even if the collection API call 
> failed
> ------------------------------------------------------------------------------------
>
>                 Key: SOLR-9492
>                 URL: https://issues.apache.org/jira/browse/SOLR-9492
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 5.5.2, 6.2
>            Reporter: Shalin Shekhar Mangar
>            Priority: Major
>              Labels: difficulty-medium, impact-high
>             Fix For: 6.7, 7.0
>
>
> A failed split shard response is:
> {code}
> {success={127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=2}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:43245_hfnp%2Fbq={responseHeader={status=0,QTime=0}},127.0.0.1:50948_hfnp%2Fbq={responseHeader={status=0,QTime=0}}},c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883644576126044&qt=/admin/cores&collection.configName=conf1&name=collection1_shard1_0_replica1&action=CREATE&collection=collection1&shard=shard1_0&wt=javabin&version=2}
>  status=0 
> QTime=2},c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883647597130004&qt=/admin/cores&collection.configName=conf1&name=collection1_shard1_1_replica1&action=CREATE&collection=collection1&shard=shard1_1&wt=javabin&version=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904 webapp=null 
> path=/admin/cores 
> params={nodeName=127.0.0.1:43245_hfnp%252Fbq&core=collection1_shard1_1_replica1&async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883649607943904&qt=/admin/cores&coreNodeName=core_node6&action=PREPRECOVERY&checkLive=true&state=active&onlyIfLeader=true&wt=javabin&version=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003 webapp=null 
> path=/admin/cores 
> params={core=collection1&async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883649612565003&qt=/admin/cores&action=SPLIT&targetCore=collection1_shard1_0_replica1&targetCore=collection1_shard1_1_replica1&wt=javabin&version=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883650618358632&qt=/admin/cores&name=collection1_shard1_1_replica1&action=REQUESTAPPLYUPDATES&wt=javabin&version=2}
>  status=0 
> QTime=0},c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900={responseHeader={status=0,QTime=0},STATUS=completed,Response=TaskId:
>  c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900 webapp=null 
> path=/admin/cores 
> params={async=c32001ed-3bca-4ae0-baae-25a3c99e35e65883650636428900&qt=/admin/cores&collection.configName=conf1&name=collection1_shard1_0_replica0&action=CREATE&collection=collection1&shard=shard1_0&wt=javabin&version=2}
>  status=0 
> QTime=0},failure={127.0.0.1:43245_hfnp%2Fbq=org.apache.solr.client.solrj.SolrServerException:IOException
>  occured when talking to server at: http://127.0.0.1:43245/hfnp/bq},Operation 
> splitshard caused exception:=org.apache.solr.common.SolrException: ADDREPLICA 
> failed to create replica,exception={msg=ADDREPLICA failed to create 
> replica,rspCode=500}}
> {code}
> Note the "failure" bit. The split shard couldn't add a replica. But when you 
> use the request status API, it returns a "completed" status.
> Apparently, completed doesn't mean it was successful! In any case, it is very 
> misleading and makes it very hard to properly use the Collection APIs. We 
> need more investigation to figure out what other Collection APIs might be 
> affected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to