Ramnatthan Alagappan created ZOOKEEPER-2528:
-----------------------------------------------

             Summary: ZooKeeper cluster can become unavailable due to power 
failures
                 Key: ZOOKEEPER-2528
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2528
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.8
         Environment: A normal ZooKeeper cluster of 3 nodes running on 3 Linux 
machines. 
            Reporter: Ramnatthan Alagappan


ZooKeeper cluster can become unavailable if power failures happen at certain 
specific points in time. 

Details:

I am running a three-node ZooKeeper cluster. I perform a simple update from a 
client machine. 

When I try to update a value, ZooKeeper creates a new log file (for example, 
when the current log is fully utilized). First, it creates the file and appends 
some header information to the newly created log. The system call sequence 
looks like below:

creat(log.200000001)
append(log.200000001, offset=0,  count=16)

Now, if a power failure happens just after the creat of the log file but before 
the append of the header information, the node simply crashes with an EOF 
exception. If the same problem occurs at two or more nodes in my three-node 
cluster, the entire cluster becomes unavailable as the majority of servers have 
crashed because of the above problem.  

A power failure at the same time across multiple nodes may be possible in 
single data center or single rack deployment scenarios. 








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to