Aishwarya Ganesan created KAFKA-4009:
----------------------------------------

             Summary: Data corruption or EIO leads to data loss
                 Key: KAFKA-4009
                 URL: https://issues.apache.org/jira/browse/KAFKA-4009
             Project: Kafka
          Issue Type: Bug
          Components: log
    Affects Versions: 0.9.0.0
            Reporter: Aishwarya Ganesan


I have a 3 node kafka cluster (N1,N2 and N3) with 
log.flush.interval.messages=1, min.insync.replicas=3 and 
unclean.leader.election.enable=false and a single Zookeeper node. My workload 
inserts few messages and on completion of the workload, the 
recovery-point-offset-checkpoint reflects the latest offset of the messages 
committed. 

I have a small testing tool that drives distributed applications into corner 
cases by simulating possible error conditions like EIO, ENOSPC and EDQUOT that 
can be encountered in all modern file systems such as ext4. The tool also 
simulates on-disk silent data corruption. 

When I introduce silent data corruption in a node (say N1) in the ISR, Kafka is 
able to detect corruption using checksum and ignores the log entry from that 
point onwards. Even though N1 has lost log entries and 
recovery-point-offset-checkpoint file in N1 indicates the latest offsets, N1 is 
allowed to become the leader because it is in the ISR.  Also, the other nodes 
N2 and N3 crash with the following log message:

FATAL [ReplicaFetcherThread-0-1], Halting because log truncation is not allowed 
for topic my-topic1, Current leader 1's latest offset 0 is less than replica 
3's latest offset 1 (kafka.server.ReplicaFetcherThread)

The end result is that a silent data corruption leads to data loss because 
querying the cluster returns only messages before the corrupted entry. Note 
that the cluster at this point has only N1. This situation could have been 
avoided if the node N1 which had to ignore the log entry wasn't allowed to 
become the leader. This scenario wouldn't happen in a majority based leader 
election as other nodes (N2 or N3) would have denied vote for N1 because N1's 
log is not complete compared to N2 or N3's log.

If this scenario happens in any of the followers, it ignores the log entry and 
copies data from the leader and there is no data loss.

Encountering an EIO thrown by the file system for a particular block results in 
the same consequence of data loss on querying the cluster and the remaining two 
nodes crash. An EIO on read could be thrown for a variety of reasons including 
a latent sector error of one or more sectors on disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to