Hi Mircea, I think we need a 3rd option in addition to a retry interval and a number of attempts, to take a site offline: a min-time (or whatever we want to call it).
Say we have retry-interval=1000 and maxRetries=5. This means that if we get a SITE-UNREACHABLE 5 times for a given site, we declare that site offline and cease sending requests to it. However, if we have 5 different threads sending requests to the site, then each of them will increment the counter and thus we take the site offline after 1 second ! That's where min-time comes in: we should wait at least min-time until we take any site offline, even if maxRetries has been exceeded. Example: min-time=60000 (ms), maxRetries=10, retryInterval=1000 (ms) If we have 20 threads sending requests to site SFO (which is down), then we might have numRetries=20 after 10 seconds, and perhaps numRetries=60 after 50 seconds. But only once 60 seconds have elapsed do we take SFO offline. The main reason for min-time would be to prevent taking a site offline during a short period of time when the site master changes and multiple threads incrementing numRetries in short order. -- Bela Ban, JGroups lead (http://www.jgroups.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
