I guess you wpuld get duplicates of you crash after data was written into the topics but before offsets were committed.
So there is no data-loss nor re-ordering for this case, but duplication. -Matthias On 1/28/21 11:20 AM, nitin agarwal wrote: > Hi, > > By committing the offsets, I meant tracking the progress of how much data > is read from the upstream system. In Kafka Connect this is being referred > as committing the offsets. > This is the method I was talking about > https://github.com/a0x8o/kafka/blob/master/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L462-L567 > > My doubt is that what if the connector gets restarted or the node on which > connector is running goes down just before flushing the offsets > <https://github.com/a0x8o/kafka/blob/master/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L521> > . > > Thank you, > Nitin > > > > On Thu, Jan 28, 2021 at 9:54 PM Matthias J. Sax <mj...@apache.org> wrote: > >> I don't know all details of Connect... >> >> However, not sure what you mean by "committing offsets"? >> >> A source connector takes data from an external data source and writes it >> into a Kafka topic. Thus, there should not be any offsets to be >> committed. (Committing offsets only applies if you read from a topic.) >> >> Instead, the "progress" how much data from the upstream system is read >> needs to be tracked. If done right (what I assume Connect does -- not >> sure if there might be a concrete connector dependency?) there should >> not be out-of-order data. >> >> But I hope that some Connect expert can chime in... >> >> >> -Matthias >> >> On 1/28/21 12:24 AM, nitin agarwal wrote: >>> Assuming the configurations are as follows: >>> max.inflight.requests.per.connection=1 >>> enable.idempotence=false >>> >>> Thanks, >>> Nitin >>> >>> >>> On Thu, Jan 28, 2021 at 1:53 PM nitin agarwal <nitingarg...@gmail.com> >>> wrote: >>> >>>> Thanks for quick reply, I have understood this behaviour now. >>>> I have another follow up question. >>>> >>>> Can the Source connector write out of order messages in a case where >> there >>>> is a failure in committing the offset and the connector is restarted at >> the >>>> same time? >>>> >>>> Thanks, >>>> Nitin >>>> >>>> On Thu, Jan 28, 2021 at 8:06 AM Matthias J. Sax <mj...@apache.org> >> wrote: >>>> >>>>> There should not be any data loss. >>>>> >>>>> However, if a request fails and is retried, it may lead to reordering >> of >>>>> sends. Thus, records would not be ordered based on the `send()` calls >>>>> any longer. >>>>> >>>>> If you would enable idempotent writes, ordering is guaranteed even with >>>>> multiple in-flight requests per connection though. >>>>> >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 1/27/21 11:35 AM, nitin agarwal wrote: >>>>>> Hi All, >>>>>> >>>>>> I see that max.inflight.requests.per.connection is set to 1 explicitly >>>>> in >>>>>> Kafka Connect but there is a way to override it. I want to understand >>>>> the >>>>>> impact of setting its value > 1. >>>>>> As per my understanding, it will lead to data loss in some cases. Is >> it >>>>>> correct ? >>>>>> >>>>>> >>>>>> Thank you, >>>>>> Nitin >>>>>> >>>>> >>>> >>> >> >