[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075875#comment-14075875
 ] 

saurabh agarwal commented on KAFKA-1555:
----------------------------------------


There are two configuration parameters- dfs.replication and 
dfs.replication.min. The behavior you described above relate to dfs.replication 
configuration.  dfs.replication.min  enforces that there are minimum number of 
replicas should be written, then only write will succeed. Otherwise it will 
fail. 

Here is abstract from Tom White's Hadoop book: "It’s possible, but unlikely, 
that multiple datanodes fail while a block is being written. As long as 
dfs.replication.min replicas (which default to one) are written, the write will 
succeed, and the block will be asynchronously replicated across the cluster 
until its target replication factor is reached (dfs.replication, which defaults 
to three)."

As you suggest, we can increase the replication factors, it will reduce the 
possibility of data loss, but it does not guarantee that there are more than 
one copy of data. "Ace =-1" ensures that it will receive ack from the replicas 
in ISR. What I am suggesting that using a new config "min.isr.required", Kafka 
ensures that the message has been written to a min number of replicas (must be 
in ISR), then only the producer.send is successful.  
 


> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Neha Narkhede
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to