Re: [DISCUSS] KIP-298: Error Handling in Connect

Ewen Cheslack-Postava Mon, 21 May 2018 11:29:18 -0700

Arjun,

Understood on retries vs tolerance -- though I suspect this will end up
being a bit confusing to users as well. It's two levels of error handling
which is what tripped me up.


One last comment on KIP (which otherwise looks good): for the tolerance
setting, do we want it to be an absolute value or something like a
percentage? Given the current way of setting things, I'm not sure I'd ever
set it to anything but -1 or 0, with maybe 1 as an easy option for
restarting a connector to get it past one bad message, then reverting back
to -1 or 0.

-Ewen

On Mon, May 21, 2018 at 11:01 AM Arjun Satish <arjun.sat...@gmail.com>
wrote:

> Hey Jason,
>
> Thanks for your comments. Please find answers inline:
>
> On Mon, May 21, 2018 at 9:52 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Arjun,
> >
> > Thanks for the KIP. Just a few comments/questions:
> >
> > 1. The proposal allows users to configure the number of retries. I
> usually
> > find it easier as a user to work with timeouts since it's difficult to
> know
> > how long a retry might take. Have you considered adding a timeout option
> > which would retry until the timeout expires?
> >
>
> Good point. Updated the KIP.
>
> 2. The configs are named very generically (e.g. errors.retries.limit). Do
> > you think it will be clear to users what operations these configs apply
> to?
> >
>
> As of now, these configs are applicable to all operations in the connector
> pipeline (as mentioned in the proposed changes section). We decided not to
> have per operation limit because of the additional config complexity.
>
>
> > 3. I wasn't entirely clear what messages are stored in the dead letter
> > queue. It sounds like it includes both configs and messages since we have
> > errors.dlq.include.configs? Is there a specific schema you have in mind?
> >
>
> This has been addressed in the KIP. The DLQ will now contain only raw
> messages (no additional context). We are also supporting DLQs only for sink
> connectors now.
>
>
> > 4. I didn't see it mentioned explicitly in the KIP, but I assume the
> > tolerance metrics are reset after every task rebalance?
> >
>
> Great question. Yes, we will reset the tolerance metrics on every
> rebalance.
>
>
> > 5. I wonder if we can do without errors.tolerance.limit. You can get a
> > similar effect using errors.tolerance.rate.limit if you allow longer
> > durations. I'm not sure how useful an absolute counter is in practice.
> >
>
> Yeah, the rate limit does subsume the features offered by the absolute
> counter. Removed it.
>
>
> >
> > Thanks,
> > Jason
> >
> >
>

Re: [DISCUSS] KIP-298: Error Handling in Connect

Reply via email to