Correct Ajay. We have done a bit of experiment in increasing the timeout and we have not seen the recurrence.
We considered following options 1) Recreate affected Shard a. We tried this using a “simulated shard-kill” and upon observance of shard-kill, Shard Manager recreates the shard - it did work !! But we stay away from this solution mainly because of DTCN side-effects . We do not know how many applications would be tolerant to get the DTCNs all of a sudden (similar to restart of node) in unexpected manner because silent Shard-Restart implies that applications have to be thoroughly idempotent in handling DTCNs across shards so that single shard recreate does not affect whatever state they internally build-up via DTCNs 2) Restart entire controller b. A more intrusive change would be to perfom bundle – 0 stop and let the restart logic take care of restarting the node depending upon the environment (systemd , pacemaker etc.) upon onPersistFailure because anyway the system would be useless if one shard stops completely. We are trying to get this correctly working but have been unsuccessful so far. Regards Muthu From: Ajay L [mailto:ajayl....@gmail.com] Sent: Wednesday, November 15, 2017 1:46 AM To: controller-dev@lists.opendaylight.org Cc: Muthukumaran K; Srini Seetharaman; Robert Varga; ajaysl...@gmail.com; Sai MarapaReddy Subject: Re: [controller-dev] Circuit Breaker timed out Hi All, We are also seeing the "circuit breaker" error under heavy load. When this happens, the affected shard is stopped and never restarted and I think the only way to recover is to restart the node. I have opened https://jira.opendaylight.org/browse/CONTROLLER-1789 to request better recovery behavior. Increasing the akka journal persistence circuit-breaker call-timeout value (default is 10s) does help in making it more tolerant to outage Regards Ajay On Wed, Aug 16, 2017 at 2:23 AM, Robert Varga <n...@hq.sk<mailto:n...@hq.sk>> wrote: On 16/08/17 08:37, Muthukumaran K wrote: > We have not tried on master branch (Nitrogen / Akka 2.5). Not sure if > such an issue would go away with Akka 2.5 because the circuit breaker is > primarily with LevelDB plugin. > Nitrogen is on akka-2.4.18. akka-2.5.x (and others) are staged for Oxygen. Bye, Robert _______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org> https://lists.opendaylight.org/mailman/listinfo/controller-dev
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev