[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147943#comment-14147943
 ] 

Joel Koshy commented on KAFKA-1555:
-----------------------------------

I'm +1 on the concept of min.isr and acks -1, 0, 1.

This is a very interesting and important thread - sorry I missed most of it 
until I spent a couple of hours (!) yesterday chewing on these comments. 
Sriram, with regard to your second point - I had a similar concern and I think 
we talked about it, but not sure if it is the same issue though. i.e., at the 
point the leader responds to the producer it knows how many followers have 
received the messages so if only min.isr - 1 replicas have been written to then 
the leader would return a NotEnoughReplicas error code. I agree that subsequent 
data loss is possible on unclean leader elections. However, that is sort of 
expected. I think Joe provided a good interpretation of min.isr - i.e., "it 
provides a balance between your tolerance for the probability of data loss for 
stored data and the need of availability of brokers to write to". For lower 
probability of loss - i.e., lower probability of unclean leader elections one 
would use a higher min.isr. Avoiding (or rather reducing) data loss on unclean 
leader elections I think is an orthogonal issue that other jiras such as 
KAFKA-1211 touch upon.

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Gwen Shapira
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1555.0.patch, KAFKA-1555.1.patch, 
> KAFKA-1555.2.patch, KAFKA-1555.3.patch, KAFKA-1555.4.patch
>
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to