During this time, the shard leadership was in some weird state. I am not sure what "PreLeader" means. Can you clarify?
-------------- instance 1 -------------- member-1-shard-default-config: Follower member-1-shard-default-operational: Follower -------------- instance2 -------------- member-2-shard-default-config: Follower member-2-shard-default-operational: PreLeader -------------- instance 3 -------------- member-3-shard-default-config: Leader member-3-shard-default-operational: Follower On Wed, Apr 5, 2017 at 12:17 PM, Srini Seetharaman < srini.seethara...@gmail.com> wrote: > Here is the code blurb from boron-sr2 from that SnapshotManager.java file: > > 209 //use the term of the temp-min, since we check for > isPresent, entry will not be null > 210 ReplicatedLogEntry entry = context.getReplicatedLog(). > get(tempMin); > 211 context.getReplicatedLog().snapshotPreCommit(tempMin, > entry.getTerm()); > 212 context.getReplicatedLog().snapshotCommit(); > 213 return tempMin; > 214 } > > > On Wed, Apr 5, 2017 at 12:15 PM, Srini Seetharaman < > srini.seethara...@gmail.com> wrote: > >> >> Hi, >> During one of my runs of bring up and down the interfaces of cluster >> members, I hit the following NPE after all 3 instances were isolated >> twice. Let me know if you need any more info besides the log below. >> >> >> 2017-04-05 19:08:51,860 | WARN | lt-dispatcher-28 | >> ConcurrentDOMDataBroker | 193 - >> org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | Tx: DOM-32 Error during phase CAN_COMMIT, starting Abort >> org.opendaylight.controller.cluster.datastore.exceptions.NoShardLeaderException: >> Shard member-2-shard-default-operational currently has no leader. Try >> again later. >> at org.opendaylight.controller.cluster.datastore.shardmanager. >> ShardManager.createNoShardLeaderException(ShardManager.java: >> 723)[193:org.opendaylight.controller.sal-distributed- >> datastore:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.datastore.shardmanager. >> ShardManager.onShardNotInitializedTimeout(ShardManager.java: >> 537)[193:org.opendaylight.controller.sal-distributed- >> datastore:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.datastore.shardmanager. >> ShardManager.handleCommand(ShardManager.java:216)[193: >> org.opendaylight.controller.sal-distributed-datastore:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.common.actor.AbstractUnt >> ypedPersistentActor.onReceiveCommand(AbstractUntyp >> edPersistentActor.java:29)[187:org.opendaylight. >> controller.sal-clustering-commons:1.4.2.Boron-SR2] >> at akka.persistence.UntypedPersistentActor.onReceive( >> PersistentActor.scala:170)[181:com.typesafe.akka.persistence:2.4.7] >> at org.opendaylight.controller.cluster.common.actor.MeteringBeh >> avior.apply(MeteringBehavior.java:97)[187:org.opendaylight. >> controller.sal-clustering-commons:1.4.2.Boron-SR2] >> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell >> .scala:544)[175:com.typesafe.akka.actor:2.4.7] >> at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[175: >> com.typesafe.akka.actor:2.4.7] >> at akka.persistence.UntypedPersistentActor.akka$persistence$ >> Eventsourced$$super$aroundReceive(PersistentActor. >> scala:168)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsour >> ced.scala:633)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.persistence.Eventsourced$class.aroundReceive(Eventsourc >> ed.scala:179)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.persistence.UntypedPersistentActor.aroundReceive( >> PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[ >> 175:com.typesafe.akka.actor:2.4.7] >> at akka.actor.ActorCell.invoke(ActorCell.scala:495)[175:com.typ >> esafe.akka.actor:2.4.7] >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[175: >> com.typesafe.akka.actor:2.4.7] >> at akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesaf >> e.akka.actor:2.4.7] >> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesa >> fe.akka.actor:2.4.7] >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask. >> java:260)[171:org.scala-lang.scala-library:2.11.8. >> v20160304-115712-1706a37eb8] >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java:1339)[171:org.scala-lang.scala-library: >> 2.11.8.v20160304-115712-1706a37eb8] >> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >> l.java:1979)[171:org.scala-lang.scala-library:2.11.8. >> v20160304-115712-1706a37eb8] >> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >> orkerThread.java:107)[171:org.scala-lang.scala-library:2.11. >> 8.v20160304-115712-1706a37eb8] >> 2017-04-05 19:08:51,863 | WARN | tor-ComputeTimer | >> GenericTransactionUtils | 301 - com.infinera.sdn.utils.transaction >> - 0.1.0.SNAPSHOT | Transaction for add of object State [_cpuInfo=CpuInfo >> [_processorCount=6, _usage=0.48, augmentation=[]], _memInfo=MemInfo >> [_memFree=138797056, >> _memTotal=12302811136, augmentation=[]], _status=class >> org.opendaylight.yang.gen.v1.urn.infinera.system.compute.rev160510.Running, >> augmentation=[]] failed with error {} >> 2017-04-05 19:09:14,056 | INFO | lt-dispatcher-35 | >> kka://opendaylight-cluster-data) | 176 - com.typesafe.akka.slf4j - 2.4.7 >> | Cluster Node [akka.tcp://opendaylight-cluster-data@172.17.0.12:2550] - >> Leader is moving node [akka.tcp://opendaylight-cluster-data@172.17.0.11:2550] >> to [Up] >> 2017-04-05 19:09:14,057 | INFO | lt-dispatcher-35 | ShardManager >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | shard-manager-operational: Received MemberUp: >> memberName: MemberName{name=member-1}, address: akka.tcp: >> //opendaylight-cluster-data@172.17.0.11:2550 >> 2017-04-05 19:09:14,057 | INFO | lt-dispatcher-35 | ShardInformation >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | updatePeerAddress for peer >> member-1-shard-default-operational with address >> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/s >> hardmanager-operational/member-1-shard-default-operational >> 2017-04-05 19:09:14,057 | INFO | lt-dispatcher-35 | ShardInformation >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | updatePeerAddress for peer >> member-1-shard-entity-ownership-operational with address >> akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/ >> shardmanager-operational/member-1-shard-entity-ownership-operational >> 2017-04-05 19:09:14,058 | INFO | lt-dispatcher-18 | ShardManager >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | shard-manager-config: Received MemberUp: memberName: >> MemberName{name=member-1}, address: akka.tcp://opendaylight-cluste >> r-data@172.17.0.11:2550 >> 2017-04-05 19:09:14,058 | INFO | lt-dispatcher-18 | ShardInformation >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | updatePeerAddress for peer >> member-1-shard-default-config with address akka.tcp://opendaylight-cluste >> r-data@172.17.0.11:2550/user/shardmanager-config/member-1- >> shard-default-config >> 2017-04-05 19:09:14,068 | INFO | lt-dispatcher-18 | ShardManager >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | shard-manager-config: All Shards are ready - data store >> config is ready, available count is 0 >> 2017-04-05 19:09:14,068 | INFO | lt-dispatcher-18 | Shard >> | 188 - org.opendaylight.controller.sal-akka-raft - >> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-config set >> to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/ >> shardmanager-config/member-1-shard-default-config >> 2017-04-05 19:09:14,063 | INFO | lt-dispatcher-28 | EntityOwnershipShard >> | 188 - org.opendaylight.controller.sal-akka-raft - >> 1.4.2.Boron-SR2 | Peer address for peer >> member-1-shard-entity-ownership-operational >> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/ >> shardmanager-operational/member-1-shard-entity-ownership-operational >> 2017-04-05 19:09:14,070 | INFO | lt-dispatcher-33 | Shard >> | 188 - org.opendaylight.controller.sal-akka-raft - >> 1.4.2.Boron-SR2 | Peer address for peer member-1-shard-default-operational >> set to akka.tcp://opendaylight-cluster-data@172.17.0.11:2550/user/ >> shardmanager-operational/member-1-shard-default-operational >> 22017-04-05 19:11:31,513 | WARN | lt-dispatcher-17 | OneForOneStrategy >> | 176 - com.typesafe.akka.slf4j - 2.4.7 | null >> 2017-04-05 19:11:31,514 | WARN | lt-dispatcher-18 | ShardManager >> | 193 - org.opendaylight.controller.sal-distributed-datastore >> - 1.4.2.Boron-SR2 | Supervisor Strategy caught unexpected exception - >> resuming >> java.lang.NullPointerException >> at org.opendaylight.controller.cluster.raft.SnapshotManager$Abs >> tractSnapshotState.doTrimLog(SnapshotManager.java:211)[188: >> org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.SnapshotManager$Idl >> e.trimLog(SnapshotManager.java:293)[188:org.opendaylight. >> controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.SnapshotManager.tri >> mLog(SnapshotManager.java:91)[188:org.opendaylight.controlle >> r.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.behaviors.AbstractR >> aftActorBehavior.performSnapshotWithoutCapture(AbstractRaftA >> ctorBehavior.java:470)[188:org.opendaylight.controller. >> sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.behaviors.AbstractL >> eader.purgeInMemoryLog(AbstractLeader.java:400)[188:org. >> opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.behaviors.AbstractL >> eader.handleAppendEntriesReply(AbstractLeader.java:368)[188: >> org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.behaviors.AbstractR >> aftActorBehavior.handleMessage(AbstractRaftActorBehavior.jav >> a:404)[188:org.opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.behaviors.AbstractL >> eader.handleMessage(AbstractLeader.java:457)[188:org. >> opendaylight.controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.behaviors.PreLeader >> .handleMessage(PreLeader.java:49)[188:org.opendaylight. >> controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.RaftActor.possiblyH >> andleBehaviorMessage(RaftActor.java:302)[188:org.opendayligh >> t.controller.sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.raft.RaftActor.handleCom >> mand(RaftActor.java:290)[188:org.opendaylight.controller. >> sal-akka-raft:1.4.2.Boron-SR2] >> at org.opendaylight.controller.cluster.common.actor.AbstractUnt >> ypedPersistentActor.onReceiveCommand(AbstractUntyp >> edPersistentActor.java:29)[187:org.opendaylight. >> controller.sal-clustering-commons:1.4.2.Boron-SR2] >> at akka.persistence.UntypedPersistentActor.onReceive( >> PersistentActor.scala:170)[181:com.typesafe.akka.persistence:2.4.7] >> at org.opendaylight.controller.cluster.common.actor.MeteringBeh >> avior.apply(MeteringBehavior.java:97)[187:org.opendaylight. >> controller.sal-clustering-commons:1.4.2.Boron-SR2] >> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell >> .scala:544)[175:com.typesafe.akka.actor:2.4.7] >> at akka.actor.Actor$class.aroundReceive(Actor.scala:484)[175: >> com.typesafe.akka.actor:2.4.7] >> at akka.persistence.UntypedPersistentActor.akka$persistence$ >> Eventsourced$$super$aroundReceive(PersistentActor. >> scala:168)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsour >> ced.scala:633)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.persistence.Eventsourced$class.aroundReceive(Eventsourc >> ed.scala:179)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.persistence.UntypedPersistentActor.aroundReceive( >> PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7] >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)[ >> 175:com.typesafe.akka.actor:2.4.7] >> at akka.actor.ActorCell.invoke(ActorCell.scala:495)[175:com.typ >> esafe.akka.actor:2.4.7] >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)[175: >> com.typesafe.akka.actor:2.4.7] >> at akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesaf >> e.akka.actor:2.4.7] >> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesa >> fe.akka.actor:2.4.7] >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask. >> java:260)[171:org.scala-lang.scala-library:2.11.8. >> v20160304-115712-1706a37eb8] >> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( >> ForkJoinPool.java:1339)[171:org.scala-lang.scala-library: >> 2.11.8.v20160304-115712-1706a37eb8] >> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo >> l.java:1979)[171:org.scala-lang.scala-library:2.11.8. >> v20160304-115712-1706a37eb8] >> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinW >> orkerThread.java:107)[171:org.scala-lang.scala-library:2.11. >> 8.v20160304-115712-1706a37eb8] >> >> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev