[ 
https://issues.apache.org/jira/browse/KAFKA-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730832#comment-14730832
 ] 

Flavio Junqueira commented on KAFKA-2510:
-----------------------------------------

The key problem is the one of a broker waking up and not finding data on disk. 
In such a scenario, is the broker faulty and lost disk state (via 
misconfiguration maybe), or is it starting from scratch? 

The solution is to write on some persistent store that the broker has written 
something to disk once it does. It could for example add to the partition 
metadata a mark once it creates the directory. Another way is to use some form 
of dbid, a number that reflects the instance of the disk state. The broker 
writes the dbid to the drive and immediately after to ZK. Upon restarting, the 
dbid must match. Note that the broker can't simply write the dbid to the 
registration znode, which is ephemeral. It must be a persistent znode.   

> Prevent broker from re-replicating / losing data due to disk misconfiguration
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-2510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2510
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Gwen Shapira
>
> Currently Kafka assumes that whatever it sees in the data directory is the 
> correct state of the data.
> This means that if an admin mistakenly configures Chef to use wrong data 
> directory, one of the following can happen:
> 1. The broker will replicate a bunch of partitions and take over the network
> 2. If you did this to enough brokers, you can lose entire topics and 
> partitions.
> We have information about existing topics, partitions and their ISR in 
> zookeeper.
> We need a mode in which if a broker starts, is in ISR for a partition and 
> doesn't have any data or directory for the partition, the broker will issue a 
> huge ERROR in the log and refuse to do anything for the partition.
> [~fpj] worked on the problem for ZK and had some ideas on what is required 
> here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to