Ramnatthan Alagappan created ZOOKEEPER-2528:
-----------------------------------------------
Summary: ZooKeeper cluster can become unavailable due to power
failures
Key: ZOOKEEPER-2528
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2528
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.4.8
Environment: A normal ZooKeeper cluster of 3 nodes running on 3 Linux
machines.
Reporter: Ramnatthan Alagappan
ZooKeeper cluster can become unavailable if power failures happen at certain
specific points in time.
Details:
I am running a three-node ZooKeeper cluster. I perform a simple update from a
client machine.
When I try to update a value, ZooKeeper creates a new log file (for example,
when the current log is fully utilized). First, it creates the file and appends
some header information to the newly created log. The system call sequence
looks like below:
creat(log.200000001)
append(log.200000001, offset=0, count=16)
Now, if a power failure happens just after the creat of the log file but before
the append of the header information, the node simply crashes with an EOF
exception. If the same problem occurs at two or more nodes in my three-node
cluster, the entire cluster becomes unavailable as the majority of servers have
crashed because of the above problem.
A power failure at the same time across multiple nodes may be possible in
single data center or single rack deployment scenarios.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)