Thanks. On 18 Sep 2012, at 19:02, Erik Salter wrote:
> FYI: https://issues.jboss.org/browse/ISPN-2319 > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Bela Ban > Sent: Monday, September 17, 2012 2:42 AM > To: [email protected] > Subject: Re: [infinispan-dev] X-Site: Site Unreachable vs. Site Down > > I agree that there should be a configuration which determines after how many > SITE-UNREACHABLE events (combined with a timeout), a site is declared as > offline. (The count should be reset when there is a successful RPC to the > remote site). > > Once a site is taken offline, then no RPCs would be sent to it, until it is > taken online again (manually, by a sysadmin), and the state transfer has > completed. > > Example: lon={A,B,C}, sfo={X,Y,Z}. > - We're in London (lon), sfo acts as the backup site to lon > - An RPC in lon includes A,B and SiteMaster(sfo) as targets > - Before the RPC hits X, X crashes > - JGroups retries X (a few times, timeout < "timeout" configured in <backup > site=.../>) > - Y takes over > - JGroups re-routes the RPC to Y > - The caller completes the RPC successfully > > - Now connectivity to sfo goes down > - A caller in lon invokes an RPC on A,C and SiteMaster(sfo) > - The call fails after 16s > - Another RPC fails after 16s > - After N failed RPCs, Infinispan in lon marks sfo as down (offline) > - The next RPC has B and C as targets, but not SiteMaster(sfo) anymore, > until sfo is brought online (manually) > > > On 9/17/12 3:21 AM, Erik Salter wrote: >> Hi all, >> >> For the X-Site pull request, Bela, Mircea and I had a design review. One > of >> the items that came up was the ability to mark a site as being "down" - >> where a site has been unreachable for a period of time. This mostly > applies >> to the synchronous replication case where the backup failure policy has > been >> configured as "FAIL", i.e: >> >> <namedCache name="importantCache"> >> <sites> >> <backups> >> >> <backup site="NYC" strategy="SYNC" backupFailurePolicy="FAIL" > timeout="16000 >> 0"/> >> </backups> >> </sites> >> </namedCache> >> >> The current implementation would be to fail all requests until a SA > realizes >> the site is offline and mark it through a JMX operation (provided in this >> release?). Since I cannot afford a 100% failure rate until somebody gets >> called, I think we need to take it a step further and add an element to > mark >> a site as offline after a period of time. (Note, though, a site can only >> be brought back online manually.) >> >> Mircea talked about adding an element in the configuration for a custom >> callback implementation. However, I think this is useful enough -- not > only >> for me -- but for other ISPN/JDG users as well. (Not to mention we can't >> add configuration for callbacks) > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
