Re: [DISCUSSION] KAFKA-1697 - make NotEnoughReplicasAfterAppend a non-retriable exception

Joel Koshy Wed, 11 Feb 2015 05:43:12 -0800

Thanks for the comments - however, it is not clear to me what your
preference is on making NotEnoughReplicasAfterAppend retriable vs
non-retriable.


As for me, my preference is to leave it as retriable since it is clear
that the produce may succeed on a retry (and may introduce a
duplicate). I agree that idempotence will bring full closure to this
though.

Anyone else have a preference on this?

Thanks,

Joel

On Tue, Feb 10, 2015 at 08:23:08PM -0800, Jay Kreps wrote:
> Yeah there are really two concepts here as I think you noted:
> 1. Retry safe: we know that the write did not occur
> 2. Retry fixable: if you send that again it could work
> 
> (probably there are better names for these).
> 
> Some things we know did not do a write and may be fixed by retrying (no
> leader). Some things we know didn't do a write and are not worth retrying
> (message too large). Somethings we don't know and are worth retrying
> (network error), and probably some things we don't know and aren't worth it
> (can't think of one though).
> 
> (I feel like Donald Rumsfeld with the "known unknowns" thing).
> 
> In the current world if you set retries > 0 you are saying "I accept
> duplicates but want to ensure my stuff gets written", if you set retries =
> 0 you are saying "I can't abide duplicates and am willing to tolerate
> loss". So Retryable for us means "retry may succeed".
> 
> Originally I thought of maybe trying to model both concepts. However the
> two arguments against it are:
> 1. Even if you do this the guarantee remains "at least once delivery"
> because: (1) in the network error case you just don't know, (2) consumer
> failure.
> 2. The proper fix for this is to add idempotence support on the server,
> which we should do.
> 
> Doing idempotence support on the server will actually fix all duplicate
> problems, including the network error case (because of course the server
> knows whether your write went through even though the client doesn't). When
> we have that then the client can always just retry anything marked
> Retriable (i.e. retry may work) without fear of duplicates.
> 
> This gives exactly once delivery to the log, and a co-operating consumer
> can use the offset to dedupe and get it end-to-end.
> 
> So that was why I had just left one type of Retriable and used it to mean
> "retry may work" and don't try to flag anything for duplicates.
> 
> -Jay
> 
> 
> 
> 
> On Tue, Feb 10, 2015 at 4:32 PM, Gwen Shapira <[email protected]> wrote:
> 
> > Hi Kafka Devs,
> >
> > Need your thoughts on retriable exceptions:
> >
> > If a user configures Kafka with min.isr > 1 and there are not enough
> > replicas to safely store the data, there are two possibilities:
> >
> > 1. The lack of replicas was discovered before the message was written. We
> > throw NotEnoughReplicas.
> > 2. The lack of replicas was discovered after the message was written to
> > leader. In this case, we throw  NotEnoughReplicasAfterAppend.
> >
> > Currently, both errors are Retriable. Which means that the new producer
> > will retry multiple times.
> > In case of the second exception, this will cause duplicates.
> >
> > KAFKA-1697 suggests:
> > "we probably want to make NotEnoughReplicasAfterAppend a non-retriable
> > exception and let the client decide what to do."
> >
> > I agreed that the client (the one using the Producer) should weight the
> > problems duplicates will cause vs. the probability of losing the message
> > and do something sensible and made the exception non-retriable.
> >
> > In the RB (https://reviews.apache.org/r/29647/) Joel raised a good point:
> > (Joel, feel free to correct me if I misrepresented your point)
> >
> > "I think our interpretation of retriable is as follows (but we can discuss
> > on the list if that needs to change): if the produce request hits an error,
> > and there is absolutely no point in retrying then that is a non-retriable
> > error. MessageSizeTooLarge is an example - since unless the producer
> > changes the request to make the messages smaller there is no point in
> > retrying.
> >
> > ...
> > Duplicates can arise even for other errors (e.g., request timed out). So
> > that side-effect is not compelling enough to warrant a change to make this
> > non-retriable. "
> >
> > *(TL;DR;  )  Should exceptions where retries can cause duplicates should
> > still be *
> > *retriable?*
> >
> > Gwen
> >

Re: [DISCUSSION] KAFKA-1697 - make NotEnoughReplicasAfterAppend a non-retriable exception

Reply via email to