[ https://issues.apache.org/jira/browse/KAFKA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761547#comment-13761547 ]
Justin SB commented on KAFKA-1050: ---------------------------------- You're definitely right Jay that I'm conflating a few ideas. I may also be more deeply confused :-) #2 is definitely the really unacceptable scenario in my book. For my use case, I can't allow a non-ISR to become the leader, because that is certain to involve data loss (by definition, I think). You're right that #1 is just ensuring that we can still tolerate failures, when we are imposing #2. Without it, we'd likely get into a scenario where e.g. only one node was alive, and if we allowed it to make progress then we wouldn't be able to recover from failure of that node. I think you're right, that I'm really trying to get majority vote semantics. I don't see why I'd be intentionally failing successful writes though. Does the leader count in "request.required.acks"? If I can write to 3/5, I do want to treat that as a success. I also want 2/5 to be considered a failure. I think I get that by setting request.required.acks=3, though maybe I need to set request.required.acks=2 if the leader is not counted as an ack. And maybe I'm just reading the ack-counting code wrong generally... It's also occurred to me that I would probably need to add rollback, as the current Kafka model wouldn't ever rollback a write on the leader because of a lack of sufficient acks (it would just remove the replicas instead)? > Support for "no data loss" mode > ------------------------------- > > Key: KAFKA-1050 > URL: https://issues.apache.org/jira/browse/KAFKA-1050 > Project: Kafka > Issue Type: Task > Reporter: Justin SB > > I'd love to use Apache Kafka, but for my application data loss is not > acceptable. Even at the expense of availability (i.e. I need C not A in CAP). > I think there are two things that I need to change to get a quorum model: > 1) Make sure I set request.required.acks to 2 (for a 3 node cluster) or 3 > (for a 5 node cluster) on every request, so that I can only write if a quorum > is active. > 2) Prevent the behaviour where a non-ISR can become the leader if all ISRs > die. I think this is as easy as tweaking > core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala, > essentially to throw an exception around line 64 in the "data loss" case. > I haven't yet implemented / tested this. I'd love to get some input from the > Kafka-experts on whether my plan is: > (a) correct - will this work? > (b) complete - have I missed any cases? > (c) recommended - is this a terrible idea :-) > Thanks for any pointers! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira