[Hadoop Wiki] Update of "ZooKeeper/GSoCFailureDetector" by AbmarBarros

Apache Wiki Mon, 16 Aug 2010 08:21:45 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "ZooKeeper/GSoCFailureDetector" page has been changed by AbmarBarros.
http://wiki.apache.org/hadoop/ZooKeeper/GSoCFailureDetector?action=diff&rev1=10&rev2=11

--------------------------------------------------

  
  ==== Experimental design ====
  
- ==== Results and conclusions ====
+  * '''First batch of tests''':
+   * 1 client and 1 server connected by an transcontinental link (Campina 
Grande-Brazil / Newark-USA)
+   * link = 1MBps, 250ms
+   * timeout = 5000ms
+   * replication = 5
+   * used the following failure detectors:
+    * Fixed heartbeat
+    * Chen (alpha = 0, 500, 1000, 2000)
+    * Bertier (moderationstep = 0, 250, 500, 1000)
+    * Phi accrual (threshold = .5, 2, 4, 8)
  
+  * '''Second batch of tests''':
+   * 200 clients and 1 server connected in an emulated WAN in emulab
+   * link = 2MBps, 250ms, message loss probability of 0.1 
+   * timeout = 5000ms
+   * used the following failure detectors with default parameters:
+    * Fixed heartbeat
+    * Chen (alpha = 1250)
+    * Bertier (moderationstep = 1000)
+ 
+ ==== Results ====
+ 
+ ==== Concluding remarks ====
+ 
+ As expected, we noticed that the fixed heartbeat method works well when we 
run ZooKeeper in a controlled environment, where the network behavior is 
expected. In this cases we can tune the fixed timeout after some network 
analysis. However, in scenarios where we have a changing network behavior, such 
in a WAN, the adaptive methods can be a good pick. Below, there is an overview 
of each failure detector:
+  * '''Fixed heartbeat''': In average, with default parameters, the fixed 
heartbeat strategy had the highest detection time, but with no false suspicion. 
However, if the timeout is not well defined, failures may take a long time to 
be detected, or false suspicion rate would be increased. As said before, this 
strategy is useful when there is a controlled environment, in which the network 
can be characterized.
+  * '''Chen''': This strategy requires some assumption over the network, once 
the administrator needs to define the alpha parameter - the safety margin for 
the estimation. However, with default parameters, Chen et al. method performed 
well in a WAN deploy. It managed to decrease the average detection time with a 
low false suspicion rate.
+  * '''Bertier''': Bertier et al initially proposed a failure detector that 
requires no assumption over the network but a single moderation step to be 
added to the estimation when the monitored is at a suspected state when a 
heartbeat is received. With these experiments, we have come to same conclusion 
as Hayashibara et al: that this failure detector is very sensitive to message 
loss and fluctuation in the arrival times of heartbeats. In this sense, the 
moderation step turned out to be an important parameter for this failure 
detector. With a moderation step of 1000, Bertier's failure detector reached a 
lower average detection time than the Chen's method, higher than the fixed 
hearbeat strategy, however there were no false suspicions.
+  * '''Phi-accrual''':
  ----
  == Design decisions ==

[Hadoop Wiki] Update of "ZooKeeper/GSoCFailureDetector" by AbmarBarros

Reply via email to