Thanks for the comments - however, it is not clear to me what your preference is on making NotEnoughReplicasAfterAppend retriable vs non-retriable.
As for me, my preference is to leave it as retriable since it is clear that the produce may succeed on a retry (and may introduce a duplicate). I agree that idempotence will bring full closure to this though. Anyone else have a preference on this? Thanks, Joel On Tue, Feb 10, 2015 at 08:23:08PM -0800, Jay Kreps wrote: > Yeah there are really two concepts here as I think you noted: > 1. Retry safe: we know that the write did not occur > 2. Retry fixable: if you send that again it could work > > (probably there are better names for these). > > Some things we know did not do a write and may be fixed by retrying (no > leader). Some things we know didn't do a write and are not worth retrying > (message too large). Somethings we don't know and are worth retrying > (network error), and probably some things we don't know and aren't worth it > (can't think of one though). > > (I feel like Donald Rumsfeld with the "known unknowns" thing). > > In the current world if you set retries > 0 you are saying "I accept > duplicates but want to ensure my stuff gets written", if you set retries = > 0 you are saying "I can't abide duplicates and am willing to tolerate > loss". So Retryable for us means "retry may succeed". > > Originally I thought of maybe trying to model both concepts. However the > two arguments against it are: > 1. Even if you do this the guarantee remains "at least once delivery" > because: (1) in the network error case you just don't know, (2) consumer > failure. > 2. The proper fix for this is to add idempotence support on the server, > which we should do. > > Doing idempotence support on the server will actually fix all duplicate > problems, including the network error case (because of course the server > knows whether your write went through even though the client doesn't). When > we have that then the client can always just retry anything marked > Retriable (i.e. retry may work) without fear of duplicates. > > This gives exactly once delivery to the log, and a co-operating consumer > can use the offset to dedupe and get it end-to-end. > > So that was why I had just left one type of Retriable and used it to mean > "retry may work" and don't try to flag anything for duplicates. > > -Jay > > > > > On Tue, Feb 10, 2015 at 4:32 PM, Gwen Shapira <[email protected]> wrote: > > > Hi Kafka Devs, > > > > Need your thoughts on retriable exceptions: > > > > If a user configures Kafka with min.isr > 1 and there are not enough > > replicas to safely store the data, there are two possibilities: > > > > 1. The lack of replicas was discovered before the message was written. We > > throw NotEnoughReplicas. > > 2. The lack of replicas was discovered after the message was written to > > leader. In this case, we throw NotEnoughReplicasAfterAppend. > > > > Currently, both errors are Retriable. Which means that the new producer > > will retry multiple times. > > In case of the second exception, this will cause duplicates. > > > > KAFKA-1697 suggests: > > "we probably want to make NotEnoughReplicasAfterAppend a non-retriable > > exception and let the client decide what to do." > > > > I agreed that the client (the one using the Producer) should weight the > > problems duplicates will cause vs. the probability of losing the message > > and do something sensible and made the exception non-retriable. > > > > In the RB (https://reviews.apache.org/r/29647/) Joel raised a good point: > > (Joel, feel free to correct me if I misrepresented your point) > > > > "I think our interpretation of retriable is as follows (but we can discuss > > on the list if that needs to change): if the produce request hits an error, > > and there is absolutely no point in retrying then that is a non-retriable > > error. MessageSizeTooLarge is an example - since unless the producer > > changes the request to make the messages smaller there is no point in > > retrying. > > > > ... > > Duplicates can arise even for other errors (e.g., request timed out). So > > that side-effect is not compelling enough to warrant a change to make this > > non-retriable. " > > > > *(TL;DR; ) Should exceptions where retries can cause duplicates should > > still be * > > *retriable?* > > > > Gwen > >
