Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "ZooKeeper/GSoCFailureDetector" page has been changed by AbmarBarros. http://wiki.apache.org/hadoop/ZooKeeper/GSoCFailureDetector?action=diff&rev1=10&rev2=11 -------------------------------------------------- ==== Experimental design ==== - ==== Results and conclusions ==== + * '''First batch of tests''': + * 1 client and 1 server connected by an transcontinental link (Campina Grande-Brazil / Newark-USA) + * link = 1MBps, 250ms + * timeout = 5000ms + * replication = 5 + * used the following failure detectors: + * Fixed heartbeat + * Chen (alpha = 0, 500, 1000, 2000) + * Bertier (moderationstep = 0, 250, 500, 1000) + * Phi accrual (threshold = .5, 2, 4, 8) + * '''Second batch of tests''': + * 200 clients and 1 server connected in an emulated WAN in emulab + * link = 2MBps, 250ms, message loss probability of 0.1 + * timeout = 5000ms + * used the following failure detectors with default parameters: + * Fixed heartbeat + * Chen (alpha = 1250) + * Bertier (moderationstep = 1000) + + ==== Results ==== + + ==== Concluding remarks ==== + + As expected, we noticed that the fixed heartbeat method works well when we run ZooKeeper in a controlled environment, where the network behavior is expected. In this cases we can tune the fixed timeout after some network analysis. However, in scenarios where we have a changing network behavior, such in a WAN, the adaptive methods can be a good pick. Below, there is an overview of each failure detector: + * '''Fixed heartbeat''': In average, with default parameters, the fixed heartbeat strategy had the highest detection time, but with no false suspicion. However, if the timeout is not well defined, failures may take a long time to be detected, or false suspicion rate would be increased. As said before, this strategy is useful when there is a controlled environment, in which the network can be characterized. + * '''Chen''': This strategy requires some assumption over the network, once the administrator needs to define the alpha parameter - the safety margin for the estimation. However, with default parameters, Chen et al. method performed well in a WAN deploy. It managed to decrease the average detection time with a low false suspicion rate. + * '''Bertier''': Bertier et al initially proposed a failure detector that requires no assumption over the network but a single moderation step to be added to the estimation when the monitored is at a suspected state when a heartbeat is received. With these experiments, we have come to same conclusion as Hayashibara et al: that this failure detector is very sensitive to message loss and fluctuation in the arrival times of heartbeats. In this sense, the moderation step turned out to be an important parameter for this failure detector. With a moderation step of 1000, Bertier's failure detector reached a lower average detection time than the Chen's method, higher than the fixed hearbeat strategy, however there were no false suspicions. + * '''Phi-accrual''': ---- == Design decisions ==
