[ 
https://issues.apache.org/jira/browse/KAFKA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761547#comment-13761547
 ] 

Justin SB commented on KAFKA-1050:
----------------------------------

You're definitely right Jay that I'm conflating a few ideas.  I may also be 
more deeply confused :-)

#2 is definitely the really unacceptable scenario in my book.  For my use case, 
I can't allow a non-ISR to become the leader, because that is certain to 
involve data loss (by definition, I think).

You're right that #1 is just ensuring that we can still tolerate failures, when 
we are imposing #2. Without it, we'd likely get into a scenario where e.g. only 
one node was alive, and if we allowed it to make progress then we wouldn't be 
able to recover from failure of that node.

I think you're right, that I'm really trying to get majority vote semantics.  I 
don't see why I'd be intentionally failing successful writes though.  Does the 
leader count in "request.required.acks"?  If I can write to 3/5, I do want to 
treat that as a success.  I also want 2/5 to be considered a failure.  I think 
I get that by setting request.required.acks=3, though maybe I need to set 
request.required.acks=2 if the leader is not counted as an ack.  And maybe I'm 
just reading the ack-counting code wrong generally...

It's also occurred to me that I would probably need to add rollback, as the 
current Kafka model wouldn't ever rollback a write on the leader because of a 
lack of sufficient acks (it would just remove the replicas instead)?
                
> Support for "no data loss" mode
> -------------------------------
>
>                 Key: KAFKA-1050
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1050
>             Project: Kafka
>          Issue Type: Task
>            Reporter: Justin SB
>
> I'd love to use Apache Kafka, but for my application data loss is not 
> acceptable.  Even at the expense of availability (i.e. I need C not A in CAP).
> I think there are two things that I need to change to get a quorum model:
> 1) Make sure I set request.required.acks to 2 (for a 3 node cluster) or 3 
> (for a 5 node cluster) on every request, so that I can only write if a quorum 
> is active.
> 2) Prevent the behaviour where a non-ISR can become the leader if all ISRs 
> die.  I think this is as easy as tweaking 
> core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, 
> essentially to throw an exception around line 64 in the "data loss" case.
> I haven't yet implemented / tested this.  I'd love to get some input from the 
> Kafka-experts on whether my plan is:
>  (a) correct - will this work?
>  (b) complete - have I missed any cases?
>  (c) recommended - is this a terrible idea :-)
> Thanks for any pointers!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to