Nope. I originally thought it would but they implemented it very conservatively, at least originally, as I found out.
On Sun, Jan 22, 2017 at 11:00 AM, Sela, Guy <guy.s...@hpe.com> wrote: > I won’t solve Shlomi’s problem, as described here in a post originated by > Tom P.J > > https://groups.google.com/forum/#!topic/akka-user/506ErDM_KA4 > > > > > > *From:* mdsal-dev-boun...@lists.opendaylight.org [mailto: > mdsal-dev-boun...@lists.opendaylight.org] *On Behalf Of *Sela, Guy > *Sent:* Sunday, January 22, 2017 5:28 PM > *To:* Tom Pantelis <tompante...@gmail.com> > *Cc:* controller-dev@lists.opendaylight.org; mdsal-dev@lists.opendaylight. > org; Alfasi, Shlomi <shlomi.alf...@hpe.com> > > *Subject:* Re: [mdsal-dev] [controller-dev] cluster - recovery from dual > failure > > > > I’m sorry, just read about weakly-up documentation. > > Sounds like it will solve Shlomi’s problem. > > What did you mean by gets it “partly” to the way we want it ? What’s > missing? > > > > > > *From:* Tom Pantelis [mailto:tompante...@gmail.com <tompante...@gmail.com>] > > *Sent:* Sunday, January 22, 2017 5:08 PM > *To:* Sela, Guy <guy.s...@hpe.com> > *Cc:* Alfasi, Shlomi <shlomi.alf...@hpe.com>; controller-dev@lists. > opendaylight.org; mdsal-...@lists.opendaylight.org > *Subject:* Re: [mdsal-dev] [controller-dev] cluster - recovery from dual > failure > > > > That's the way it works and the akka designers have reasons for it. They > added "weakly-up" which gets it partly to the way we would want it to work > and they've said they may add more options to better control the behavior. > > > > You can enable auto-down in your setup. Or an external script to monitor > the process and, if it goes down, then send a "down" request (via jolokia) > to the cluster leader. > > > > On Sun, Jan 22, 2017 at 9:37 AM, Sela, Guy <guy.s...@hpe.com> wrote: > > Hi, > > Just read the documentation, very interesting. > > So that means that ODL Cluster can’t automatically recover from more than > a single concurrent failure. > > Even if we had a cluster of 10 nodes, if one becomes unreachable, none of > the others can restart, until the first one will be reachable again. > > Sounds like a serious restriction for production. > > Are there any best practices how to deal with this situations? (Without > manual intervention) > > > > *From:* mdsal-dev-boun...@lists.opendaylight.org [mailto: > mdsal-dev-boun...@lists.opendaylight.org] *On Behalf Of *Tom Pantelis > *Sent:* Sunday, January 22, 2017 4:30 PM > *To:* Alfasi, Shlomi <shlomi.alf...@hpe.com> > *Cc:* controller-dev@lists.opendaylight.org; mdsal-dev@lists.opendaylight. > org > *Subject:* Re: [mdsal-dev] [controller-dev] cluster - recovery from dual > failure > > > > This is a side effect of how akka clustering works. All unreachable nodes > must first become reachable again, or the status of the unreachable nodes > must be changed to 'Down', either manually or auto-downed. You can enable > auto-downing but akka doesn't recommend it in production ( > http://doc.akka.io/docs/akka/current/java/cluster-usage.html). > > > > On Sun, Jan 22, 2017 at 8:53 AM, Alfasi, Shlomi <shlomi.alf...@hpe.com> > wrote: > > Hi All, > > > > I configured a clustered setup with 3 nodes (attached the akka.conf of one > of the nodes). > > At a specific time one of the members in the cluster was down and then I > restarted another node. > > In the restarted node I see that it fails to read information from the > datastore and repetitively throw exceptions [1] > > In the node that was always up, every 10 seconds there is a log that imply > that the restarted node doesn’t manage to join [2] > > > > What is the expected behavior in this case? Is this state recoverable? > > > > Shlomi > > > > [1] > > WARN | ult-dispatcher-2 | DataStoreAppConfigMetadata | 153 - > org.opendaylight.controller.blueprint - 0.5.2.SNAPSHOT | > org.opendaylight.netvirt.elanmanager-impl (elanConfig): Read of app > config org.opend > > aylight.yang.gen.v1.urn.opendaylight.netvirt.elan.config.rev150710.ElanConfig > failed - retrying > > ReadFailedException{message=Error executeRead ReadData for path > /(urn:opendaylight:netvirt:elan:config?revision=2015-07-10)elan-config, > errorList=[RpcError [message=Error executeRead ReadData for path > /(urn:opendaylight:netvirt:elan:co > > nfig?revision=2015-07-10)elan-config, severity=ERROR, > errorType=APPLICATION, tag=operation-failed, applicationTag=null, > info=null, > cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException: > Shard member-3-s > > hard-default-config currently has no leader. Try again later.]]} > > > > [2] > > 2017-01-22 15:19:56,290 | INFO | lt-dispatcher-22 | > kka://opendaylight-cluster-data) | 159 - com.typesafe.akka.slf4j - 2.4.7 > | Cluster Node [akka.tcp://opendaylight-cluster-data@10.0.77.33:2550] - > New incarnation of existing member [M > > ember(address = akka.tcp://opendaylight-cluster-data@10.0.97.128:2550, > status = Down)] is trying to join. Existing will be removed from the > cluster and then new member will be allowed to join. > > > > > _______________________________________________ > controller-dev mailing list > controller-dev@lists.opendaylight.org > https://lists.opendaylight.org/mailman/listinfo/controller-dev > > > > >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev