Hi, Please, tell me how to unsubscribe. Thanks, Serhii On Mon, 18 May 2020, 23:44 Aakash Shah, <as...@confluent.io> wrote:
> Hi Chris, > > I agree with your point. > > Randall, Konstantine, do you guys mind weighing in on any benefit of adding > asynchronous functionality using a Future in the KIP right now? It seems to > me that it only provides user control on when the thread will be blocked, > and if we are going to process all the futures at once in a batch at the > end, why not support batch processing in a future KIP, since it is not too > difficult now that we are adding an interface. I'm not sure I see any gain > beyond some user control that could increase throughput - but at the same > time, as I mentioned before, I don't think throughput is a factor we need > to consider much with error reporting. We don't really need or necessarily > want a higher throughput on error reporting, as ideally, there should not > be a high volume of errant records. > > Thanks, > Aakash > > On Mon, May 18, 2020 at 1:22 PM Chris Egerton <chr...@confluent.io> wrote: > > > Hi Aakash, > > > > Yep, that's pretty much it. I'd also like to emphasize that we should be > > identifying practical use cases for whatever API we provide. Giving > > developers a future that can be made synchronous with little effort seems > > flexible, but if that's all that developers are going to do with it > anyway, > > why make it a future at all? We should have some idea of how people would > > use a future that doesn't just hinge on them blocking on it immediately, > > and isn't more easily-addressed by a different API (such as one with > batch > > reporting). > > > > Cheers, > > > > Chris > > > > On Mon, May 18, 2020 at 1:17 PM Aakash Shah <as...@confluent.io> wrote: > > > > > Hi all, > > > > > > Chris, I see your points about whether Futures provide much benefit at > > all > > > as they are not truly fully asynchronous. > > > > > > Correct me if I am wrong, but I think what you are trying to point out > is > > > that if we have the option to add additional functionality later (in a > > > simpler way too since we are introducing a new interface), we should > > > provide functionality that we know will provide value immediately and > not > > > cause any developer/user burden. > > > > > > In that case, I think the main area we have to come to a consensus on > is > > - > > > how much control do we want to provide to the developer/user in this > KIP > > > considering that we can add the functionality relatively easily later? > > > > > > Randall, Konstantine, what do you think about adding it later vs now? > > > > > > Thanks, > > > Aakash > > > > > > On Mon, May 18, 2020 at 12:45 PM Chris Egerton <chr...@confluent.io> > > > wrote: > > > > > > > Hi Aakash, > > > > > > > > I asked this earlier about whether futures were the right way to go, > if > > > we > > > > wanted to enable asynchronous behavior at all: > > > > > > > > > I'm still unclear on how futures are going to provide any benefit > to > > > > developers, though. Blocking on the return of such a future slightly > > > later > > > > on in the process of handling records is still blocking, and to be > done > > > > truly asynchronously without blocking processing of non-errant > records, > > > > would have to be done on a separate thread. It's technically possible > > for > > > > users to cache all of these futures and instead of invoking "get" on > > > them, > > > > simply check whether they're complete or not via "isDone", but this > > seems > > > > like an anti-pattern. > > > > > > > > > What is the benefit of wrapping this in a future? > > > > > > > > As far as I can tell, there hasn't been a practical example yet where > > the > > > > flexibility provided by a future would actually be beneficial in > > writing > > > a > > > > connector. It'd be great if we could find one. One possible use case > > > might > > > > be processing records received in "SinkTask::put" without having to > > block > > > > for each errant record report before sending non-errant records to > the > > > > sink. However, this could also be addressed by allowing for batch > > > reporting > > > > of errant records instead of accepting a single record at a time; the > > > task > > > > would track errant records as it processes them in "put" and report > > them > > > > all en-masse after all non-errant records have been processed. > > > > > > > > With regards to the precedent of using futures for asynchronous > APIs, I > > > > think we should make sure that whatever API we decide on is actually > > > useful > > > > for the cases it serves. There's plenty of precedent for > callback-based > > > > asynchronous APIs in Kafka with both "Producer::send" and > > > > "Consumer::commitAsync"; the question here shouldn't be about what's > > done > > > > in different APIs, but what would work for this one in particular. > > > > > > > > Finally, it's also been brought up that if we're going to introduce a > > new > > > > error reporter interface, we can always modify that interface later > on > > to > > > > go from asynchronous to synchronous behavior, or vice-versa, or even > to > > > add > > > > a callback- or future-based variant that didn't exist before. We have > > > > plenty of room to maneuver in the future here, so the pressure to get > > > > everything right the first time and provide maximum flexibility > doesn't > > > > seem as pressing, and the goal of minimizing the kind of API that we > > have > > > > to support for future versions without making unnecessary additions > is > > > > easier to achieve. > > > > > > > > Cheers, > > > > > > > > Chris > > > > > > > > > > > > > > > > On Mon, May 18, 2020 at 12:20 PM Aakash Shah <as...@confluent.io> > > wrote: > > > > > > > > > Hi Arjun, > > > > > > > > > > Thanks for your feedback. > > > > > > > > > > I agree with moving to Future<Void>, those are good points. > > > > > > > > > > I believe an earlier point made for asynchronous functionality were > > > that > > > > > modern APIs tend to be asynchronous as they result in more > expressive > > > and > > > > > better defined APIs. > > > > > Additionally, because a lot of Kafka Connect functionality is > already > > > > > asynchronous, I am inclined to believe that customers will want an > > > > > asynchronous solution for this as well. And if is relatively simple > > to > > > > > block with future.get() to make it synchronous, would you not say > > that > > > > > having an opt-in synchronous functionality rather than synchronous > > only > > > > > functionality allows for customer control while maintaining that > not > > > too > > > > > much burden of implementation is placed on the customer? > > > > > WDYT? > > > > > > > > > > Thanks, > > > > > Aakash > > > > > > > > > > On Sun, May 17, 2020 at 2:51 PM Arjun Satish < > arjun.sat...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > Thanks for all the feedback, folks. > > > > > > > > > > > > re: having a callback as a parameter, I agree that at this point, > > it > > > > > might > > > > > > not add much value to the proposal. > > > > > > > > > > > > re: synchronous vs asynchronous, is the motivation > > performance/higher > > > > > > throughput? Taking a step back, calling report(..) in the new > > > interface > > > > > > does a couple of things: > > > > > > > > > > > > 1. at a fundamental level, it is a signal to the framework that a > > > > failure > > > > > > occurred when processing records, specifically due to the given > > > record. > > > > > > 2. depending on whether errors.log and errors.deadletterqueue has > > > been > > > > > set, > > > > > > some messages are written to zero or more destinations. > > > > > > 3. depending on the value of errors.tolerance (none or all), the > > task > > > > is > > > > > > failed after reporters have completed. > > > > > > > > > > > > for kip-610, the asynchronous method has the advantage of working > > > with > > > > > the > > > > > > internal dead letter queue (which has been transparent to the > > > developer > > > > > so > > > > > > far). but, how does async method help if the DLQ is not enabled? > in > > > > this > > > > > > case RecordMetadata is not very useful, AFAICT? also, if we add > > more > > > > > error > > > > > > reporters in the future (say, for example, a new reporter in a > > future > > > > > that > > > > > > writes to a RDBMS), would the async version return success on all > > or > > > > > > nothing, and what about partial successes? > > > > > > > > > > > > overall, if we really need async behavior, I'd prefer to just use > > > > > > Future<Void>. but if we can keep it simple, then let's go with a > > > > > > synchronous function with the parameters Randall proposed above > > (with > > > > > > return type as void, and if any of the reporters fail, the task > is > > > > failed > > > > > > if error.tolerance is none, and kept alive if tolerance is all), > > and > > > > > maybe > > > > > > add asynchronous methods in a future KIP? > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > >