[
https://issues.apache.org/jira/browse/SOLR-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated SOLR-3561:
------------------------------
Fix Version/s: (was: 4.0)
5.0
4.1
> Error during deletion of shard/core
> -----------------------------------
>
> Key: SOLR-3561
> URL: https://issues.apache.org/jira/browse/SOLR-3561
> Project: Solr
> Issue Type: Bug
> Components: multicore, replication (java), SolrCloud
> Affects Versions: 4.0-ALPHA
> Environment: Solr trunk (4.0-SNAPSHOT) from 29/2-2012
> Reporter: Per Steffensen
> Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Running several Solr servers in Cloud-cluster (zkHost set on the Solr
> servers).
> Several collections with several slices and one replica for each slice (each
> slice has two shards)
> Basically we want let our system delete an entire collection. We do this by
> trying to delete each and every shard under the collection. Each shard is
> deleted one by one, by doing CoreAdmin-UNLOAD-requests against the relevant
> Solr
> {code}
> CoreAdminRequest request = new CoreAdminRequest();
> request.setAction(CoreAdminAction.UNLOAD);
> request.setCoreName(shardName);
> CoreAdminResponse resp = request.process(new CommonsHttpSolrServer(solrUrl));
> {code}
> The delete/unload succeeds, but in like 10% of the cases we get errors on
> involved Solr servers, right around the time where shard/cores are deleted,
> and we end up in a situation where ZK still claims (forever) that the deleted
> shard is still present and active.
> Form here the issue is easilier explained by a more concrete example:
> - 7 Solr servers involved
> - Several collection a.o. one called "collection_2012_04", consisting of 28
> slices, 56 shards (remember 1 replica for each slice) named
> "collection_2012_04_sliceX_shardY" for all pairs in {X:1..28}x{Y:1,2}
> - Each Solr server running 8 shards, e.g Solr server #1 is running shard
> "collection_2012_04_slice1_shard1" and Solr server #7 is running shard
> "collection_2012_04_slice1_shard2" belonging to the same slice "slice1".
> When we decide to delete the collection "collection_2012_04" we go through
> all 56 shards and delete/unload them one-by-one - including
> "collection_2012_04_slice1_shard1" and "collection_2012_04_slice1_shard2". At
> some point during or shortly after all this deletion we see the following
> exceptions in solr.log on Solr server #7
> {code}
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover:org.apache.solr.common.SolrException:
> core not found:collection_2012_04_slice1_shard1
> request:
> http://solr_server_1:8983/solr/admin/cores?action=PREPRECOVERY&core=collection_2012_04_slice1_shard1&nodeName=solr_server_7%3A8983_solr&coreNodeName=solr_server_7%3A8983_solr_collection_2012_04_slice1_shard2&state=recovering&checkLive=true&pauseFor=6000&wt=javabin&version=2
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:29)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
> at
> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
> Aug 1, 2012 12:02:50 AM org.apache.solr.common.SolrException log
> SEVERE: Recovery failed - trying again...
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> WARNING:
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
> at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
> at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
> Aug 1, 2012 12:02:51 AM org.apache.solr.cloud.LeaderElector$1 process
> {code}
> Im not sure exactly how to interpret this, but it seems to me that some
> recovery job tries to recover collection_2012_04_slice1_shard2 on Solr server
> #7 from collection_2012_04_slice1_shard1 on Solr server #1, but fail because
> Solr server #1 answers back that it doesnt run
> collection_2012_04_slice1_shard1 (anymore).
> This problem occurs for serveral (in this conrete test for 4) of the 28 slice
> pairs. For those 4 shards the end result is that
> /node_states/solr_server_X:8983_solr in ZK still contains information about
> the shard being running and active. E.g. /node_states/solr_server_7:8983_solr
> still contains
> {code}
> {
> "shard":"slice1",
> "state":"active",
> "core":"collection_2012_04_slice1_shard2",
> "collection":"collection_2012_04",
> "node_name":"solr_server_7:8983_solr",
> "base_url":"http://solr_server_7:8983/solr"
> }
> {code}
> and that CloudState therefore still reports that those shards are running and
> active - but thay are not. A.o. I have noticed that
> "collection_2012_04_slice1_shard2" HAS been removed from solr.xml on Solr
> server #7 (we are running with persistent="true")
> Any chance that this bug is fixed in a later revision (than one from
> 29/2-2012) of 4.0-SNAPSHOT?
> If not we need to get it fixed, I believe.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]