Correct Ajay. We have done a bit of experiment in increasing the timeout and we 
have not seen the recurrence.

We considered following options

1)      Recreate affected Shard

a.       We tried this using a “simulated shard-kill” and upon observance of 
shard-kill, Shard Manager recreates the shard - it did work !!  But we stay 
away from this solution mainly because of DTCN side-effects . We do not know 
how many applications would be tolerant to get the DTCNs all of a sudden 
(similar to restart of node) in unexpected manner because silent Shard-Restart 
implies that applications have to be thoroughly idempotent in handling DTCNs 
across shards so that single shard recreate does not affect whatever state they 
internally build-up via DTCNs

2)      Restart entire controller
b.    A more intrusive change would be to perfom bundle – 0 stop and let the 
restart logic take care of restarting the node depending upon the environment 
(systemd , pacemaker etc.) upon
                               onPersistFailure because anyway the system would 
be useless if one shard stops completely. We are trying to get this correctly 
working but have been unsuccessful so far.
Regards
Muthu


From: Ajay L [mailto:ajayl....@gmail.com]
Sent: Wednesday, November 15, 2017 1:46 AM
To: controller-dev@lists.opendaylight.org
Cc: Muthukumaran K; Srini Seetharaman; Robert Varga; ajaysl...@gmail.com; Sai 
MarapaReddy
Subject: Re: [controller-dev] Circuit Breaker timed out

Hi All,

We are also seeing the "circuit breaker" error under heavy load. When this 
happens, the affected shard is stopped and never restarted and I think the only 
way to recover is to restart the node. I have opened 
https://jira.opendaylight.org/browse/CONTROLLER-1789 to request better recovery 
behavior. Increasing the akka journal persistence circuit-breaker call-timeout 
value (default is 10s) does help in making it more tolerant to outage

Regards
Ajay

On Wed, Aug 16, 2017 at 2:23 AM, Robert Varga <n...@hq.sk<mailto:n...@hq.sk>> 
wrote:
On 16/08/17 08:37, Muthukumaran K wrote:
> We have not tried on master branch (Nitrogen  / Akka 2.5). Not sure if
> such an issue would go away with Akka 2.5 because the circuit breaker is
> primarily with LevelDB plugin.
>

Nitrogen is on akka-2.4.18. akka-2.5.x (and others) are staged for Oxygen.

Bye,
Robert


_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
https://lists.opendaylight.org/mailman/listinfo/controller-dev

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to