[ https://issues.apache.org/jira/browse/SOLR-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shalin Shekhar Mangar updated SOLR-7673: ---------------------------------------- Attachment: Lucene-Solr-Tests-trunk-Java8-69.log Attaching full log from the jenkins failure. > Race condition in shard splitting waiting for sub-shard replicas to go into > recovery > ------------------------------------------------------------------------------------ > > Key: SOLR-7673 > URL: https://issues.apache.org/jira/browse/SOLR-7673 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.10.4, 5.2 > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Fix For: 5.3, Trunk > > Attachments: Lucene-Solr-Tests-trunk-Java8-69.log > > > I found this while digging into the failure on > https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java8/69/ > {code} > [junit4] 2> 390032 INFO > (OverseerStateUpdate-93987739315601411-127.0.0.1:57926_-n_0000000000) > [n:127.0.0.1:57926_ ] o.a.s.c.o.ReplicaMutator Update state numShards=1 > message={ > [junit4] 2> "core":"testasynccollectioncreation_shard1_0_replica2", > [junit4] 2> "core_node_name":"core_node5", > [junit4] 2> "roles":null, > [junit4] 2> "base_url":"http://127.0.0.1:38702", > [junit4] 2> "node_name":"127.0.0.1:38702_", > [junit4] 2> "numShards":"1", > [junit4] 2> "state":"active", > [junit4] 2> "shard":"shard1_0", > [junit4] 2> "collection":"testasynccollectioncreation", > [junit4] 2> "operation":"state"} > [junit4] 2> 390033 INFO > (OverseerStateUpdate-93987739315601411-127.0.0.1:57926_-n_0000000000) > [n:127.0.0.1:57926_ ] o.a.s.c.o.ZkStateWriter going to update_collection > /collections/testasynccollectioncreation/state.json version: 18 > [junit4] 2> 390033 INFO > (RecoveryThread-testasynccollectioncreation_shard1_0_replica2) > [n:127.0.0.1:38702_ c:testasynccollectioncreation s:shard1_0 r:core_node5 > x:testasynccollectioncreation_shard1_0_replica2] o.a.s.u.UpdateLog Took 2 ms > to seed version buckets with highest version 1503803851028824064 > [junit4] 2> 390033 INFO > (zkCallback-167-thread-1-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_ > ] o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent > state:SyncConnected type:NodeDataChanged > path:/collections/testasynccollectioncreation/state.json for collection > testasynccollectioncreation has occurred - updating... (live nodes size: 2) > [junit4] 2> 390033 INFO > (RecoveryThread-testasynccollectioncreation_shard1_0_replica2) > [n:127.0.0.1:38702_ c:testasynccollectioncreation s:shard1_0 r:core_node5 > x:testasynccollectioncreation_shard1_0_replica2] o.a.s.c.RecoveryStrategy > Finished recovery process. > [junit4] 2> 390033 INFO > (zkCallback-173-thread-1-processing-n:127.0.0.1:38702_) [n:127.0.0.1:38702_ > ] o.a.s.c.c.ZkStateReader A cluster state change: WatchedEvent > state:SyncConnected type:NodeDataChanged > path:/collections/testasynccollectioncreation/state.json for collection > testasynccollectioncreation has occurred - updating... (live nodes size: 2) > [junit4] 2> 390034 INFO > (zkCallback-167-thread-1-processing-n:127.0.0.1:57926_) [n:127.0.0.1:57926_ > ] o.a.s.c.c.ZkStateReader Updating data for testasynccollectioncreation to > ver 19 > [junit4] 2> 390034 INFO > (zkCallback-173-thread-1-processing-n:127.0.0.1:38702_) [n:127.0.0.1:38702_ > ] o.a.s.c.c.ZkStateReader Updating data for testasynccollectioncreation to > ver 19 > [junit4] 2> 390298 INFO (qtp1859109869-1664) [n:127.0.0.1:38702_ ] > o.a.s.h.a.CollectionsHandler Invoked Collection Action :requeststatus with > paramsrequestid=1004&action=REQUESTSTATUS&wt=javabin&version=2 > [junit4] 2> 390299 INFO (qtp1859109869-1664) [n:127.0.0.1:38702_ ] > o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/collections > params={requestid=1004&action=REQUESTSTATUS&wt=javabin&version=2} status=0 > QTime=1 > [junit4] 2> 390348 INFO (qtp1859109869-1665) [n:127.0.0.1:38702_ ] > o.a.s.h.a.CoreAdminHandler Checking request status for : 10042599422118342586 > [junit4] 2> 390348 INFO (qtp1859109869-1665) [n:127.0.0.1:38702_ ] > o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/cores > params={qt=/admin/cores&requestid=10042599422118342586&action=REQUESTSTATUS&wt=javabin&version=2} > status=0 QTime=1 > [junit4] 2> 390348 INFO > (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) > [n:127.0.0.1:57926_ ] o.a.s.c.OverseerCollectionProcessor Asking sub shard > leader to wait for: testasynccollectioncreation_shard1_0_replica2 to be alive > on: 127.0.0.1:38702_ > [junit4] 2> 390349 INFO > (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) > [n:127.0.0.1:57926_ ] o.a.s.c.OverseerCollectionProcessor Creating replica > shard testasynccollectioncreation_shard1_1_replica2 as part of slice shard1_1 > of collection testasynccollectioncreation on 127.0.0.1:38702_ > [junit4] 2> 390349 INFO > (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) > [n:127.0.0.1:57926_ ] o.a.s.c.c.ZkStateReader Load collection config > from:/collections/testasynccollectioncreation > [junit4] 2> 390350 INFO > (OverseerThreadFactory-891-thread-4-processing-n:127.0.0.1:57926_) > [n:127.0.0.1:57926_ ] o.a.s.c.c.ZkStateReader > path=/collections/testasynccollectioncreation configName=conf1 specified > config exists in ZooKeeper > [junit4] 2> 390352 INFO (qtp381614561-1631) [n:127.0.0.1:57926_ ] > o.a.s.s.SolrDispatchFilter [admin] webapp=null path=/admin/cores > params={nodeName=127.0.0.1:38702_&core=testasynccollectioncreation_shard1_0_replica1&async=10042599424132346542&qt=/admin/cores&coreNodeName=core_node5&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2} > status=0 QTime=2 > {code} > The following sequence of events lead to deadlock: > # testasynccollectioncreation_shard1_0_replica2 (core_node5) becomes active > # OCP asks sub-shard leader testasynccollectioncreation_shard1_0_replica1 to > wait until replica2 is in recovery > At this point, the test just keeps pinging for status until timeout and fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org