Re: [DISCUSS] KIP-610: Error Reporting in Sink Connectors

Serhii Bilyk Tue, 19 May 2020 01:37:11 -0700

Hi,
Please, tell me how to unsubscribe.
Thanks,
Serhii

On Mon, 18 May 2020, 23:44 Aakash Shah, <as...@confluent.io> wrote:


> Hi Chris,
>
> I agree with your point.
>
> Randall, Konstantine, do you guys mind weighing in on any benefit of adding
> asynchronous functionality using a Future in the KIP right now? It seems to
> me that it only provides user control on when the thread will be blocked,
> and if we are going to process all the futures at once in a batch at the
> end, why not support batch processing in a future KIP, since it is not too
> difficult now that we are adding an interface. I'm not sure I see any gain
> beyond some user control that could increase throughput - but at the same
> time, as I mentioned before, I don't think throughput is a factor we need
> to consider much with error reporting. We don't really need or necessarily
> want a higher throughput on error reporting, as ideally, there should not
> be a high volume of errant records.
>
> Thanks,
> Aakash
>
> On Mon, May 18, 2020 at 1:22 PM Chris Egerton <chr...@confluent.io> wrote:
>
> > Hi Aakash,
> >
> > Yep, that's pretty much it. I'd also like to emphasize that we should be
> > identifying practical use cases for whatever API we provide. Giving
> > developers a future that can be made synchronous with little effort seems
> > flexible, but if that's all that developers are going to do with it
> anyway,
> > why make it a future at all? We should have some idea of how people would
> > use a future that doesn't just hinge on them blocking on it immediately,
> > and isn't more easily-addressed by a different API (such as one with
> batch
> > reporting).
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, May 18, 2020 at 1:17 PM Aakash Shah <as...@confluent.io> wrote:
> >
> > > Hi all,
> > >
> > > Chris, I see your points about whether Futures provide much benefit at
> > all
> > > as they are not truly fully asynchronous.
> > >
> > > Correct me if I am wrong, but I think what you are trying to point out
> is
> > > that if we have the option to add additional functionality later (in a
> > > simpler way too since we are introducing a new interface), we should
> > > provide functionality that we know will provide value immediately and
> not
> > > cause any developer/user burden.
> > >
> > > In that case, I think the main area we have to come to a consensus on
> is
> > -
> > > how much control do we want to provide to the developer/user in this
> KIP
> > > considering that we can add the functionality relatively easily later?
> > >
> > > Randall, Konstantine, what do you think about adding it later vs now?
> > >
> > > Thanks,
> > > Aakash
> > >
> > > On Mon, May 18, 2020 at 12:45 PM Chris Egerton <chr...@confluent.io>
> > > wrote:
> > >
> > > > Hi Aakash,
> > > >
> > > > I asked this earlier about whether futures were the right way to go,
> if
> > > we
> > > > wanted to enable asynchronous behavior at all:
> > > >
> > > > > I'm still unclear on how futures are going to provide any benefit
> to
> > > > developers, though. Blocking on the return of such a future slightly
> > > later
> > > > on in the process of handling records is still blocking, and to be
> done
> > > > truly asynchronously without blocking processing of non-errant
> records,
> > > > would have to be done on a separate thread. It's technically possible
> > for
> > > > users to cache all of these futures and instead of invoking "get" on
> > > them,
> > > > simply check whether they're complete or not via "isDone", but this
> > seems
> > > > like an anti-pattern.
> > > >
> > > > > What is the benefit of wrapping this in a future?
> > > >
> > > > As far as I can tell, there hasn't been a practical example yet where
> > the
> > > > flexibility provided by a future would actually be beneficial in
> > writing
> > > a
> > > > connector. It'd be great if we could find one. One possible use case
> > > might
> > > > be processing records received in "SinkTask::put" without having to
> > block
> > > > for each errant record report before sending non-errant records to
> the
> > > > sink. However, this could also be addressed by allowing for batch
> > > reporting
> > > > of errant records instead of accepting a single record at a time; the
> > > task
> > > > would track errant records as it processes them in "put" and report
> > them
> > > > all en-masse after all non-errant records have been processed.
> > > >
> > > > With regards to the precedent of using futures for asynchronous
> APIs, I
> > > > think we should make sure that whatever API we decide on is actually
> > > useful
> > > > for the cases it serves. There's plenty of precedent for
> callback-based
> > > > asynchronous APIs in Kafka with both "Producer::send" and
> > > > "Consumer::commitAsync"; the question here shouldn't be about what's
> > done
> > > > in different APIs, but what would work for this one in particular.
> > > >
> > > > Finally, it's also been brought up that if we're going to introduce a
> > new
> > > > error reporter interface, we can always modify that interface later
> on
> > to
> > > > go from asynchronous to synchronous behavior, or vice-versa, or even
> to
> > > add
> > > > a callback- or future-based variant that didn't exist before. We have
> > > > plenty of room to maneuver in the future here, so the pressure to get
> > > > everything right the first time and provide maximum flexibility
> doesn't
> > > > seem as pressing, and the goal of minimizing the kind of API that we
> > have
> > > > to support for future versions without making unnecessary additions
> is
> > > > easier to achieve.
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > >
> > > >
> > > > On Mon, May 18, 2020 at 12:20 PM Aakash Shah <as...@confluent.io>
> > wrote:
> > > >
> > > > > Hi Arjun,
> > > > >
> > > > > Thanks for your feedback.
> > > > >
> > > > > I agree with moving to Future<Void>, those are good points.
> > > > >
> > > > > I believe an earlier point made for asynchronous functionality were
> > > that
> > > > > modern APIs tend to be asynchronous as they result in more
> expressive
> > > and
> > > > > better defined APIs.
> > > > > Additionally, because a lot of Kafka Connect functionality is
> already
> > > > > asynchronous, I am inclined to believe that customers will want an
> > > > > asynchronous solution for this as well. And if is relatively simple
> > to
> > > > > block with future.get() to make it synchronous, would you not say
> > that
> > > > > having an opt-in synchronous functionality rather than synchronous
> > only
> > > > > functionality allows for customer control while maintaining that
> not
> > > too
> > > > > much burden of implementation is placed on the customer?
> > > > > WDYT?
> > > > >
> > > > > Thanks,
> > > > > Aakash
> > > > >
> > > > > On Sun, May 17, 2020 at 2:51 PM Arjun Satish <
> arjun.sat...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for all the feedback, folks.
> > > > > >
> > > > > > re: having a callback as a parameter, I agree that at this point,
> > it
> > > > > might
> > > > > > not add much value to the proposal.
> > > > > >
> > > > > > re: synchronous vs asynchronous, is the motivation
> > performance/higher
> > > > > > throughput? Taking a step back, calling report(..) in the new
> > > interface
> > > > > > does a couple of things:
> > > > > >
> > > > > > 1. at a fundamental level, it is a signal to the framework that a
> > > > failure
> > > > > > occurred when processing records, specifically due to the given
> > > record.
> > > > > > 2. depending on whether errors.log and errors.deadletterqueue has
> > > been
> > > > > set,
> > > > > > some messages are written to zero or more destinations.
> > > > > > 3. depending on the value of errors.tolerance (none or all), the
> > task
> > > > is
> > > > > > failed after reporters have completed.
> > > > > >
> > > > > > for kip-610, the asynchronous method has the advantage of working
> > > with
> > > > > the
> > > > > > internal dead letter queue (which has been transparent to the
> > > developer
> > > > > so
> > > > > > far). but, how does async method help if the DLQ is not enabled?
> in
> > > > this
> > > > > > case RecordMetadata is not very useful, AFAICT? also, if we add
> > more
> > > > > error
> > > > > > reporters in the future (say, for example, a new reporter in a
> > future
> > > > > that
> > > > > > writes to a RDBMS), would the async version return success on all
> > or
> > > > > > nothing, and what about partial successes?
> > > > > >
> > > > > > overall, if we really need async behavior, I'd prefer to just use
> > > > > > Future<Void>. but if we can keep it simple, then let's go with a
> > > > > > synchronous function with the parameters Randall proposed above
> > (with
> > > > > > return type as void, and if any of the reporters fail, the task
> is
> > > > failed
> > > > > > if error.tolerance is none, and kept alive if tolerance is all),
> > and
> > > > > maybe
> > > > > > add asynchronous methods in a future KIP?
> > > > > >
> > > > > > Best,
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-610: Error Reporting in Sink Connectors

Reply via email to