Just did :) On 7 September 2017 at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:
> Can we please start the vote on this KIP? The KIP must be accepted by next > Wednesday in order to make the cut for 1.0.0. This issue keeps coming up > again and again, and I'd really like to include a fix for 1.0.0. > > Ismael > > On Thu, Sep 7, 2017 at 10:01 PM, Apurva Mehta <apu...@confluent.io> wrote: > > > I agree with what Ismael said. Having both retries and > delivery.timeout.ms > > is confusing, and thus the goal is to not have a retries option at all > once > > idempotence is fully battle tested and has become the entrenched default. > > > > Until that time, it makes sense to expire batch earlier than > > delivery.timeout.ms if retries have been exhausted. > > > > Thanks, > > Apurva > > > > > > On Thu, Sep 7, 2017 at 6:07 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Good question regarding retries Sumant. A few comments: > > > > > > 1. Defaulting to MAX_INT makes sense in the context of > > delivery.timeout.ms > > > , > > > but introduces the possibility of reordering with the default > > max.in.flight > > > of 5. Personally, I think reordering is better than dropping the > message > > > altogether (if we keep retries=0), but it's worth noting this. > > > > > > 2. I agree that we should expire on whichever of retries and > > > delivery.timeout.ms is exhausted first for 1.0.0. > > > > > > 3. Once KIP-185 lands (post 1.0.0), we should consider deprecating and > > > eventually removing the retries config to simplify things (it won't > have > > > much use then). > > > > > > 4. With regards to the case where the broker replies quickly with an > > error, > > > we need to understand a bit more what the error is. For any kind of > > > connection issue, we now have exponential backoff. For the case where > an > > > error code is returned, it depends on whether the error is retriable or > > > not. For the former, it probably makes sense to keep retrying as it's > > > supposed to be a transient issue. If we think it would make sense to > > apply > > > exponential backoff, we could also consider that. So, I'm not sure > > retries > > > has much use apart from compatibility and the retries=0 case (for now). > > > > > > Ismael > > > > > > On Wed, Sep 6, 2017 at 11:14 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > Hi, Sumant, > > > > > > > > The diagram in the wiki seems to imply that delivery.timeout.ms > > doesn't > > > > include the batching time. > > > > > > > > For retries, probably we can just default it to MAX_INT? > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Wed, Sep 6, 2017 at 10:26 AM, Sumant Tambe <suta...@gmail.com> > > wrote: > > > > > > > > > 120 seconds default sounds good to me. Throwing ConfigException > > instead > > > > of > > > > > WARN is fine. Added clarification that the producer waits the full > > > > > request.timeout.ms for the in-flight request. This implies that > user > > > > might > > > > > be notified of batch expiry while a batch is still in-flight. > > > > > > > > > > I don't recall if we discussed our point of view that existing > > configs > > > > like > > > > > retries become redundant/deprecated with this feature. IMO, retries > > > > config > > > > > becomes meaningless due to the possibility of incorrect configs > like > > > > > delivery.timeout.ms > linger.ms + retries * (request..timeout.ms + > > > > > retry.backoff.ms), retries should be basically interpreted as > > MAX_INT? > > > > > What > > > > > will be the default? > > > > > > > > > > So do we ignore retries config or throw a ConfigException if > > weirdness > > > > like > > > > > above is detected? > > > > > > > > > > -Sumant > > > > > > > > > > > > > > > On 5 September 2017 at 17:34, Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > > > > > Thanks for updating the KIP, Sumant. A couple of points: > > > > > > > > > > > > 1. I think the default for delivery.timeout.ms should be higher > > than > > > > 30 > > > > > > seconds given that we previously would reset the clock once the > > batch > > > > was > > > > > > sent. The value should be large enough that batches are not > expired > > > due > > > > > to > > > > > > expected events like a new leader being elected due to broker > > > failure. > > > > > > Would it make sense to use a conservative value like 120 seconds? > > > > > > > > > > > > 2. The producer currently throws an exception for configuration > > > > > > combinations that don't make sense. We should probably do the > same > > > here > > > > > for > > > > > > consistency (the KIP currently proposes a log warning). > > > > > > > > > > > > 3. We should mention that we will not cancel in flight requests > > until > > > > the > > > > > > request timeout even though we'll expire the batch early if > needed. > > > > > > > > > > > > I think we should start the vote tomorrow so that we have a > chance > > of > > > > > > hitting the KIP freeze for 1.0.0. > > > > > > > > > > > > Ismael > > > > > > > > > > > > On Wed, Sep 6, 2017 at 1:03 AM, Sumant Tambe <suta...@gmail.com> > > > > wrote: > > > > > > > > > > > > > I've updated the kip-91 writeup > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > 91+Provide+Intuitive+User+Timeouts+in+The+Producer> > > > > > > > to capture some of the discussion here. Please confirm if it's > > > > > > sufficiently > > > > > > > accurate. Feel free to edit it if you think some explanation > can > > be > > > > > > better > > > > > > > and has been agreed upon here. > > > > > > > > > > > > > > How do you proceed from here? > > > > > > > > > > > > > > -Sumant > > > > > > > > > > > > > > On 30 August 2017 at 12:59, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > > > > > Hi, Jiangjie, > > > > > > > > > > > > > > > > I mis-understood Jason's approach earlier. It does seem to > be a > > > > good > > > > > > one. > > > > > > > > We still need to calculate the selector timeout based on the > > > > > remaining > > > > > > > > delivery.timeout.ms to call the callback on time, but we can > > > > always > > > > > > wait > > > > > > > > for an inflight request based on request.timeout.ms. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > On Tue, Aug 29, 2017 at 5:16 PM, Becket Qin < > > > becket....@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Yeah, I think expiring a batch but still wait for the > > response > > > is > > > > > > > > probably > > > > > > > > > reasonable given the result is not guaranteed anyways. > > > > > > > > > > > > > > > > > > @Jun, > > > > > > > > > > > > > > > > > > I think the frequent PID reset may still be possible if we > do > > > not > > > > > > wait > > > > > > > > for > > > > > > > > > the in-flight response to return. Consider two partitions > p0 > > > and > > > > > p1, > > > > > > > the > > > > > > > > > deadline of the batches for p0 are T + 10, T + 30, T + > 50... > > > The > > > > > > > deadline > > > > > > > > > of the batches for p1 are T + 20, T + 40, T + 60... > Assuming > > > each > > > > > > > request > > > > > > > > > takes more than 10 ms to get the response. The following > > > sequence > > > > > may > > > > > > > be > > > > > > > > > possible: > > > > > > > > > > > > > > > > > > T: PID0 send batch0_p0(PID0), batch0_p1(PID0) > > > > > > > > > T + 10: PID0 expires batch0_p0(PID0), without resetting > PID, > > > > sends > > > > > > > > > batch1_p0(PID0) and batch0_p1(PID0, retry) > > > > > > > > > T + 20: PID0 expires batch0_p1(PID0, retry), resets the PID > > to > > > > > PID1, > > > > > > > > sends > > > > > > > > > batch1_p0(PID0, retry) and batch1_p1(PID1) > > > > > > > > > T + 30: PID1 expires batch1_p0(PID0, retry), without > > resetting > > > > PID, > > > > > > > sends > > > > > > > > > batch2_p0(PID1) and batch1_p1(PID1, retry) > > > > > > > > > T + 40: PID1 expires batch1_p1(PID1, retry), resets the PID > > to > > > > > PID2, > > > > > > > > sends > > > > > > > > > batch2_p0(PID1, retry) and sends batch2_p1(PID2) > > > > > > > > > .... > > > > > > > > > > > > > > > > > > In the above example, the producer will reset PID once > every > > > two > > > > > > > > requests. > > > > > > > > > The example did not take retry backoff into consideration, > > but > > > it > > > > > > still > > > > > > > > > seems possible to encounter frequent PID reset if we do not > > > wait > > > > > for > > > > > > > the > > > > > > > > > request to finish. Also, in this case we will have a lot of > > > > retries > > > > > > and > > > > > > > > > mixture of PIDs which seem to be pretty complicated. > > > > > > > > > > > > > > > > > > I think Jason's suggestion will address both concerns, i.e. > > we > > > > fire > > > > > > the > > > > > > > > > callback at exactly delivery.timeout.ms, but we will still > > > wait > > > > > for > > > > > > > the > > > > > > > > > response to be returned before sending the next request. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 29, 2017 at 4:00 PM, Jun Rao <j...@confluent.io > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hmm, I thought delivery.timeout.ms bounds the time from > a > > > > > message > > > > > > is > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > accumulator (i.e., when send() returns) to the time when > > the > > > > > > callback > > > > > > > > is > > > > > > > > > > called. If we wait for request.timeout.ms for an > inflight > > > > > request > > > > > > > and > > > > > > > > > the > > > > > > > > > > remaining delivery.timeout.ms is less than > > > request.timeout.ms, > > > > > the > > > > > > > > > > callback > > > > > > > > > > may be called later than delivery.timeout.ms, right? > > > > > > > > > > > > > > > > > > > > Jiangjie's concern on resetting the pid on every expired > > > batch > > > > is > > > > > > > > > probably > > > > > > > > > > not an issue if we only reset the pid when the expired > > > batch's > > > > > pid > > > > > > is > > > > > > > > the > > > > > > > > > > same as the current pid, as Jason suggested. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > On Tue, Aug 29, 2017 at 3:09 PM, Jason Gustafson < > > > > > > ja...@confluent.io > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > I think the semantics of delivery.timeout.ms need to > > allow > > > > for > > > > > > the > > > > > > > > > > > possibility that the record was actually written. > Unless > > we > > > > can > > > > > > > keep > > > > > > > > on > > > > > > > > > > > retrying indefinitely, there's really no way to know > for > > > sure > > > > > > > whether > > > > > > > > > the > > > > > > > > > > > record was written or not. A delivery timeout just > means > > > that > > > > > we > > > > > > > > cannot > > > > > > > > > > > guarantee that the record was delivered. > > > > > > > > > > > > > > > > > > > > > > -Jason > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 29, 2017 at 2:51 PM, Becket Qin < > > > > > > becket....@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Jason, > > > > > > > > > > > > > > > > > > > > > > > > If we expire the batch from user's perspective but > > still > > > > > > waiting > > > > > > > > for > > > > > > > > > > the > > > > > > > > > > > > response, would that mean it is likely that the batch > > > will > > > > be > > > > > > > > > > > successfully > > > > > > > > > > > > appended but the users will receive a > TimeoutException? > > > > That > > > > > > > seems > > > > > > > > a > > > > > > > > > > > little > > > > > > > > > > > > non-intuitive to the users. Arguably it maybe OK > though > > > > > because > > > > > > > > > > currently > > > > > > > > > > > > when TimeoutException is thrown, there is no > guarantee > > > > > whether > > > > > > > the > > > > > > > > > > > messages > > > > > > > > > > > > are delivered or not. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 29, 2017 at 12:33 PM, Jason Gustafson < > > > > > > > > > ja...@confluent.io> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > I think I'm with Becket. We should wait for > > > > > > request.timeout.ms > > > > > > > > for > > > > > > > > > > > each > > > > > > > > > > > > > produce request we send. We can still await the > > > response > > > > > > > > internally > > > > > > > > > > for > > > > > > > > > > > > > PID/sequence maintenance even if we expire the > batch > > > from > > > > > the > > > > > > > > > user's > > > > > > > > > > > > > perspective. New sequence numbers would be assigned > > > based > > > > > on > > > > > > > the > > > > > > > > > > > current > > > > > > > > > > > > > PID until the response returns and we find whether > a > > > PID > > > > > > reset > > > > > > > is > > > > > > > > > > > > actually > > > > > > > > > > > > > needed. This makes delivery.timeout.ms a hard > limit > > > > which > > > > > is > > > > > > > > > easier > > > > > > > > > > to > > > > > > > > > > > > > explain. > > > > > > > > > > > > > > > > > > > > > > > > > > -Jason > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe < > > > > > > > > suta...@gmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm updating the kip-91 writeup. There seems to > be > > > some > > > > > > > > confusion > > > > > > > > > > > about > > > > > > > > > > > > > > expiring an inflight request. An inflight request > > > gets > > > > a > > > > > > full > > > > > > > > > > > > > > delivery.timeout.ms duration from creation, > right? > > > So > > > > it > > > > > > > > should > > > > > > > > > be > > > > > > > > > > > > > > max(remaining delivery.timeout.ms, > > > request.timeout.ms > > > > )? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun, we do want to wait for an inflight request > for > > > > > longer > > > > > > > than > > > > > > > > > > > > > > request.timeout.ms. right? > > > > > > > > > > > > > > > > > > > > > > > > > > > > What happens to a batch when retries * ( > > > > > request.timeout.ms > > > > > > + > > > > > > > > > > > > > > retry.backoff.ms) < delivery.timeout.ms and all > > > > retries > > > > > > are > > > > > > > > > > > > > exhausted? I > > > > > > > > > > > > > > remember an internal discussion where we > concluded > > > that > > > > > > > retries > > > > > > > > > can > > > > > > > > > > > be > > > > > > > > > > > > no > > > > > > > > > > > > > > longer relevant (i.e., ignored, which is same as > > > > > > > > > retries=MAX_LONG) > > > > > > > > > > > when > > > > > > > > > > > > > > there's an end-to-end delivery.timeout.ms. Do > you > > > > agree? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Sumant > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 27 August 2017 at 12:08, Jun Rao < > > > j...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Jiangjie, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we want to enforce delivery.timeout.ms, we > > need > > > > to > > > > > > take > > > > > > > > the > > > > > > > > > > min > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > Also, if a user sets a large > delivery.timeout.ms > > , > > > we > > > > > > > > probably > > > > > > > > > > > don't > > > > > > > > > > > > > want > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > wait for an inflight request longer than > > > > > > > request.timeout.ms. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin < > > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Jason, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I see what you mean. That makes sense. So in > > the > > > > > above > > > > > > > case > > > > > > > > > > after > > > > > > > > > > > > the > > > > > > > > > > > > > > > > producer resets PID, when it retry > batch_0_tp1, > > > the > > > > > > batch > > > > > > > > > will > > > > > > > > > > > > still > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > the old PID even if the producer has already > > got > > > a > > > > > new > > > > > > > PID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Jun, do you mean max(remaining > > > > delivery.timeout.ms, > > > > > > > > > > > > > > request.timeout.ms) > > > > > > > > > > > > > > > > instead of min(remaining delivery.timeout.ms > , > > > > > > > > > > request.timeout.ms > > > > > > > > > > > )? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao < > > > > > > > j...@confluent.io > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Becket, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good point on expiring inflight requests. > > > Perhaps > > > > > we > > > > > > > can > > > > > > > > > > expire > > > > > > > > > > > > an > > > > > > > > > > > > > > > > inflight > > > > > > > > > > > > > > > > > request after min(remaining > > > delivery.timeout.ms, > > > > > > > > > > > > > request.timeout.ms > > > > > > > > > > > > > > ). > > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > way, if a user sets a high > > delivery.timeout.ms > > > , > > > > we > > > > > > can > > > > > > > > > still > > > > > > > > > > > > > recover > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > broker power outage sooner. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2017 at 12:52 PM, Becket > Qin > > < > > > > > > > > > > > > becket....@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Jason, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > delivery.timeout.ms sounds good to me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I was referring to the case that we are > > > > resetting > > > > > > the > > > > > > > > > > > > > PID/sequence > > > > > > > > > > > > > > > > after > > > > > > > > > > > > > > > > > > expire a batch. This is more about the > > > sending > > > > > the > > > > > > > > > batches > > > > > > > > > > > > after > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > expired batch. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The scenario being discussed is expiring > > one > > > of > > > > > the > > > > > > > > > batches > > > > > > > > > > > in > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > in-flight > > > > > > > > > > > > > > > > > > request and retry the other batches in > the > > > that > > > > > > > > in-flight > > > > > > > > > > > > > request. > > > > > > > > > > > > > > So > > > > > > > > > > > > > > > > > > consider the following case: > > > > > > > > > > > > > > > > > > 1. Producer sends request_0 with two > > batches > > > > > > > > (batch_0_tp0 > > > > > > > > > > and > > > > > > > > > > > > > > > > > batch_0_tp1). > > > > > > > > > > > > > > > > > > 2. Broker receives the request enqueued > the > > > > > request > > > > > > > to > > > > > > > > > the > > > > > > > > > > > log. > > > > > > > > > > > > > > > > > > 3. Before the producer receives the > > response > > > > from > > > > > > the > > > > > > > > > > broker, > > > > > > > > > > > > > > > > batch_0_tp0 > > > > > > > > > > > > > > > > > > expires. The producer will expire > > batch_0_tp0 > > > > > > > > > immediately, > > > > > > > > > > > > resets > > > > > > > > > > > > > > > PID, > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > then resend batch_0_tp1, and maybe send > > > > > batch_1_tp0 > > > > > > > > (i.e. > > > > > > > > > > the > > > > > > > > > > > > > next > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > the expired batch) as well. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For batch_0_tp1, it is OK to reuse PID > and > > > and > > > > > > > sequence > > > > > > > > > > > number. > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > problem > > > > > > > > > > > > > > > > > > is for batch_1_tp0, If we reuse the same > > PID > > > > and > > > > > > the > > > > > > > > > broker > > > > > > > > > > > has > > > > > > > > > > > > > > > already > > > > > > > > > > > > > > > > > > appended batch_0_tp0, the broker will > think > > > > > > > batch_1_tp0 > > > > > > > > > is > > > > > > > > > > a > > > > > > > > > > > > > > > duplicate > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > the same sequence number. As a result > > broker > > > > will > > > > > > > drop > > > > > > > > > > > > > batch_0_tp1. > > > > > > > > > > > > > > > > That > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > why we have to either bump up sequence > > number > > > > or > > > > > > > reset > > > > > > > > > PID. > > > > > > > > > > > To > > > > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > complexity, I was suggesting not expire > the > > > > > > in-flight > > > > > > > > > batch > > > > > > > > > > > > > > > > immediately, > > > > > > > > > > > > > > > > > > but wait for the produce response. If the > > > batch > > > > > has > > > > > > > > been > > > > > > > > > > > > > > successfully > > > > > > > > > > > > > > > > > > appended, we do not expire it. Otherwise, > > we > > > > > expire > > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2017 at 11:26 AM, Jason > > > > > Gustafson < > > > > > > > > > > > > > > > ja...@confluent.io> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Becket > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good point about unnecessarily > resetting > > > the > > > > > PID > > > > > > in > > > > > > > > > cases > > > > > > > > > > > > where > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > know > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > request has failed. Might be worth > > opening > > > a > > > > > JIRA > > > > > > > to > > > > > > > > > try > > > > > > > > > > > and > > > > > > > > > > > > > > > improve > > > > > > > > > > > > > > > > > > this. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So if we expire the batch prematurely > and > > > > > resend > > > > > > > all > > > > > > > > > > > > > > > > > > > > the other batches in the same > request, > > > > > chances > > > > > > > are > > > > > > > > > > there > > > > > > > > > > > > will > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > duplicates. If we wait for the > response > > > > > > instead, > > > > > > > it > > > > > > > > > is > > > > > > > > > > > less > > > > > > > > > > > > > > > likely > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > introduce duplicates, and we may not > > need > > > > to > > > > > > > reset > > > > > > > > > the > > > > > > > > > > > PID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not sure I follow this. Are you > assuming > > > that > > > > > we > > > > > > > > change > > > > > > > > > > the > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > PID/sequence of the retried batches > after > > > > > > resetting > > > > > > > > the > > > > > > > > > > > PID? > > > > > > > > > > > > I > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > probably need to ensure that when we > > retry > > > a > > > > > > batch, > > > > > > > > we > > > > > > > > > > > always > > > > > > > > > > > > > use > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > same > > > > > > > > > > > > > > > > > > > PID/sequence. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > By the way, as far as naming, ` > > > > > > > > > > > max.message.delivery.wait.ms` > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > quite > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > mouthful. Could we shorten it? Perhaps > ` > > > > > > > > > > > delivery.timeout.ms > > > > > > > > > > > > `? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 8:51 PM, Becket > > > Qin < > > > > > > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Jun, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If TCP timeout is longer than > > > > > > request.timeout.ms > > > > > > > , > > > > > > > > > the > > > > > > > > > > > > > producer > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > always > > > > > > > > > > > > > > > > > > > > hit request.timeout.ms before > hitting > > > TCP > > > > > > > timeout, > > > > > > > > > > > right? > > > > > > > > > > > > > That > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > why > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > added request.timeout.ms in the > first > > > > place. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You are right. Currently we are reset > > the > > > > PID > > > > > > and > > > > > > > > > > resend > > > > > > > > > > > > the > > > > > > > > > > > > > > > > batches > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > avoid OutOfOrderSequenceException > when > > > the > > > > > > > expired > > > > > > > > > > > batches > > > > > > > > > > > > > are > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > retry. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This does not distinguish the reasons > > > that > > > > > > caused > > > > > > > > the > > > > > > > > > > > > retry. > > > > > > > > > > > > > > > There > > > > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > > > two > > > > > > > > > > > > > > > > > > > > cases: > > > > > > > > > > > > > > > > > > > > 1. If the batch was in retry because > it > > > > > > received > > > > > > > an > > > > > > > > > > error > > > > > > > > > > > > > > > response > > > > > > > > > > > > > > > > > > (e.g. > > > > > > > > > > > > > > > > > > > > NotLeaderForPartition), we actually > > don't > > > > > need > > > > > > to > > > > > > > > > reset > > > > > > > > > > > PID > > > > > > > > > > > > > in > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > case > > > > > > > > > > > > > > > > > > > > because we know that broker did not > > > accept > > > > > it. > > > > > > > > > > > > > > > > > > > > 2. If the batch was in retry because > it > > > > hit a > > > > > > > > timeout > > > > > > > > > > > > > earlier, > > > > > > > > > > > > > > > then > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > should reset the PID (or > optimistically > > > > send > > > > > > and > > > > > > > > only > > > > > > > > > > > reset > > > > > > > > > > > > > PID > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > receive OutOfOrderSequenceException?) > > > > > > > > > > > > > > > > > > > > Case 1 is probably the most common > > case, > > > so > > > > > it > > > > > > > > looks > > > > > > > > > > that > > > > > > > > > > > > we > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > > > resetting > > > > > > > > > > > > > > > > > > > > the PID more often than necessary. > But > > > > > because > > > > > > in > > > > > > > > > case > > > > > > > > > > 1 > > > > > > > > > > > > the > > > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > > > does > > > > > > > > > > > > > > > > > > > > not have the batch, there isn't much > > > impact > > > > > on > > > > > > > > > resting > > > > > > > > > > > PID > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > resend > > > > > > > > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > > > > than the additional round trip. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Now we are introducing another case: > > > > > > > > > > > > > > > > > > > > 3. A batch is in retry because we > > expired > > > > an > > > > > > > > > in-flight > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > before > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > hits request.timeout.ms. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The difference between 2 and 3 is > that > > in > > > > > case > > > > > > 3 > > > > > > > > > likely > > > > > > > > > > > the > > > > > > > > > > > > > > > broker > > > > > > > > > > > > > > > > > has > > > > > > > > > > > > > > > > > > > > appended the messages. So if we > expire > > > the > > > > > > batch > > > > > > > > > > > > prematurely > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > resend > > > > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > > > the other batches in the same > request, > > > > > chances > > > > > > > are > > > > > > > > > > there > > > > > > > > > > > > will > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > duplicates. If we wait for the > response > > > > > > instead, > > > > > > > it > > > > > > > > > is > > > > > > > > > > > less > > > > > > > > > > > > > > > likely > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > introduce duplicates, and we may not > > need > > > > to > > > > > > > reset > > > > > > > > > the > > > > > > > > > > > PID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That said, given that batch > expiration > > is > > > > > > > probably > > > > > > > > > > > already > > > > > > > > > > > > > rare > > > > > > > > > > > > > > > > > enough, > > > > > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > > > > > it may not be necessary to optimize > for > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 5:01 PM, Jun > > Rao > > > < > > > > > > > > > > > j...@confluent.io > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Becket, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If a message expires while it's in > an > > > > > > inflight > > > > > > > > > > produce > > > > > > > > > > > > > > request, > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > producer will get a new PID if > > > idempotent > > > > > is > > > > > > > > > enabled. > > > > > > > > > > > > This > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > prevent > > > > > > > > > > > > > > > > > > > > > subsequent messages from hitting > > > > > > > > > > > > > OutOfOrderSequenceException. > > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > > issue > > > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > not expiring an inflight request is > > > that > > > > > if a > > > > > > > > > broker > > > > > > > > > > > > server > > > > > > > > > > > > > > > goes > > > > > > > > > > > > > > > > > down > > > > > > > > > > > > > > > > > > > > hard > > > > > > > > > > > > > > > > > > > > > (e.g. power outage), the time that > it > > > > takes > > > > > > for > > > > > > > > the > > > > > > > > > > > > client > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > detect > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > socket level error (this will be > sth > > > like > > > > > 8+ > > > > > > > > > minutes > > > > > > > > > > > with > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > default > > > > > > > > > > > > > > > > > > > TCP > > > > > > > > > > > > > > > > > > > > > setting) is much longer than the > > > default > > > > > > > > > > > > > request.timeout.ms. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, Sumant, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We can probably just default > > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > to > > > > > > > > > > > > > > > 30 > > > > > > > > > > > > > > > > > > secs, > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > current default for > > request.timeout.ms > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 3:38 PM, > > Sumant > > > > > > Tambe < > > > > > > > > > > > > > > > suta...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK. Looks like starting the clock > > > after > > > > > > > closing > > > > > > > > > the > > > > > > > > > > > > batch > > > > > > > > > > > > > > has > > > > > > > > > > > > > > > > > > quite a > > > > > > > > > > > > > > > > > > > > few > > > > > > > > > > > > > > > > > > > > > > pitfalls. I can't think of a way > of > > > to > > > > > work > > > > > > > > > around > > > > > > > > > > it > > > > > > > > > > > > > > without > > > > > > > > > > > > > > > > > > adding > > > > > > > > > > > > > > > > > > > > yet > > > > > > > > > > > > > > > > > > > > > > another config. So I won't > discuss > > > that > > > > > > here. > > > > > > > > > > Anyone? > > > > > > > > > > > > As > > > > > > > > > > > > > I > > > > > > > > > > > > > > > said > > > > > > > > > > > > > > > > > > > > earlier, > > > > > > > > > > > > > > > > > > > > > > I'm not hung up on super-accurate > > > > > > > notification > > > > > > > > > > times. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we are going down the > > > > > > > > > > > max.message.delievery.wait.ms > > > > > > > > > > > > > > > route, > > > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > the default? There seem to be a > few > > > > > > options. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. max.message.delievery.wait.ms > = > > > null. > > > > > > > Nothing > > > > > > > > > > > changes > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > those > > > > > > > > > > > > > > > > > > who > > > > > > > > > > > > > > > > > > > > > don't > > > > > > > > > > > > > > > > > > > > > > set it. I.e., batches expire > after > > > > > > > > > > > request.timeout.ms > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > accumulator. > > > > > > > > > > > > > > > > > > > > If > > > > > > > > > > > > > > > > > > > > > > they are past the accumulator > > stage, > > > > > > timeout > > > > > > > > > after > > > > > > > > > > > > > > retries*( > > > > > > > > > > > > > > > > > > > > > > request.timeout.ms+backoff). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. max.message.delivery.wait.ms > =r > > > > > > > > > equest.timeout.ms > > > > > > > > > > . > > > > > > > > > > > No > > > > > > > > > > > > > > > > obervable > > > > > > > > > > > > > > > > > > > > > > behavioral > > > > > > > > > > > > > > > > > > > > > > change at the accumulator level > as > > > > > timeout > > > > > > > > value > > > > > > > > > is > > > > > > > > > > > > same > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > before. > > > > > > > > > > > > > > > > > > > > > Retries > > > > > > > > > > > > > > > > > > > > > > will be done if as long as batch > is > > > > under > > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > However, a batch can expire just > > > after > > > > > one > > > > > > > try. > > > > > > > > > > > That's > > > > > > > > > > > > ok > > > > > > > > > > > > > > IMO > > > > > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > > > > > > > > request.timeout.ms tend to be > > large > > > > > > (Default > > > > > > > > > > 30000). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. max.message.delivery.wait.ms= > 2* > > > > > > > > > > request.timeout.ms > > > > > > > > > > > . > > > > > > > > > > > > > Give > > > > > > > > > > > > > > > > > > > opportunity > > > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > two retries but warn that retries > > may > > > > not > > > > > > > > happen > > > > > > > > > at > > > > > > > > > > > all > > > > > > > > > > > > > in > > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > > rare > > > > > > > > > > > > > > > > > > > > > > cases and a batch could expire > > before > > > > any > > > > > > > > > attempt. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. max.message.delivery.wait.ms= > > > > > something > > > > > > > else > > > > > > > > > (a > > > > > > > > > > > > > > constant?) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 23 August 2017 at 09:01, > Ismael > > > > Juma < > > > > > > > > > > > > > ism...@juma.me.uk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks Becket, that seems > > > reasonable. > > > > > > > Sumant, > > > > > > > > > > would > > > > > > > > > > > > you > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > willing > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > update the KIP based on the > > > > discussion > > > > > or > > > > > > > are > > > > > > > > > you > > > > > > > > > > > > still > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > > > > convinced? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 6:04 > AM, > > > > Becket > > > > > > > Qin < > > > > > > > > > > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In general > > > > > > max.message.delivery.wait.ms > > > > > > > > is a > > > > > > > > > > > > cleaner > > > > > > > > > > > > > > > > > approach. > > > > > > > > > > > > > > > > > > > > That > > > > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > > > > > > make the guarantee clearer. > > That > > > > > said, > > > > > > > > there > > > > > > > > > > seem > > > > > > > > > > > > > > > > subtleties > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > > > > > > > > scenarios: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. I agree with Sumante that > it > > > is > > > > a > > > > > > > little > > > > > > > > > > weird > > > > > > > > > > > > > that > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > > expired immediately if it > > happens > > > > to > > > > > > > enter > > > > > > > > a > > > > > > > > > > > batch > > > > > > > > > > > > > that > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > about > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > > expired. But as Jun said, as > > long > > > > as > > > > > we > > > > > > > > have > > > > > > > > > > > > multiple > > > > > > > > > > > > > > > > > messages > > > > > > > > > > > > > > > > > > > in a > > > > > > > > > > > > > > > > > > > > > > > batch, > > > > > > > > > > > > > > > > > > > > > > > > there isn't a cheap way to > > > achieve > > > > a > > > > > > > > precise > > > > > > > > > > > > timeout. > > > > > > > > > > > > > > So > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > question > > > > > > > > > > > > > > > > > > > > > > > > actually becomes whether it > is > > > more > > > > > > > > > > user-friendly > > > > > > > > > > > > to > > > > > > > > > > > > > > > expire > > > > > > > > > > > > > > > > > > early > > > > > > > > > > > > > > > > > > > > > > (based > > > > > > > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > > > > > > > the batch creation time) or > > > expire > > > > > late > > > > > > > > > (based > > > > > > > > > > on > > > > > > > > > > > > the > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > close > > > > > > > > > > > > > > > > > > > > > > time). > > > > > > > > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > > > > > > > > think both are acceptable. > > > > > Personally I > > > > > > > > think > > > > > > > > > > > most > > > > > > > > > > > > > > users > > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > > > > > really > > > > > > > > > > > > > > > > > > > > > > > care > > > > > > > > > > > > > > > > > > > > > > > > about expire a little late as > > > long > > > > as > > > > > > it > > > > > > > > > > > eventually > > > > > > > > > > > > > > > > expires. > > > > > > > > > > > > > > > > > > So I > > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > > > > > > > > batch close time as long as > > there > > > > is > > > > > a > > > > > > > > bound > > > > > > > > > on > > > > > > > > > > > > that. > > > > > > > > > > > > > > But > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > looks > > > > > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > > > > > do not really have a bound on > > > when > > > > we > > > > > > > will > > > > > > > > > > close > > > > > > > > > > > a > > > > > > > > > > > > > > batch. > > > > > > > > > > > > > > > > So > > > > > > > > > > > > > > > > > > > > > expiration > > > > > > > > > > > > > > > > > > > > > > > > based on batch create time > may > > be > > > > the > > > > > > > only > > > > > > > > > > option > > > > > > > > > > > > if > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > don't > > > > > > > > > > > > > > > > > > > want > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > > introduce complexity. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. If we timeout a batch in a > > > > request > > > > > > > when > > > > > > > > it > > > > > > > > > > is > > > > > > > > > > > > > still > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > flight, > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > end > > > > > > > > > > > > > > > > > > > > > > > > result of that batch is > unclear > > > to > > > > > the > > > > > > > > users. > > > > > > > > > > It > > > > > > > > > > > > > would > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > weird > > > > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > user > > > > > > > > > > > > > > > > > > > > > > > > receive exception saying > those > > > > > messages > > > > > > > are > > > > > > > > > > > expired > > > > > > > > > > > > > > while > > > > > > > > > > > > > > > > > they > > > > > > > > > > > > > > > > > > > > > actually > > > > > > > > > > > > > > > > > > > > > > > > have been sent successfully. > > Also > > > > if > > > > > > > > > > idempotence > > > > > > > > > > > is > > > > > > > > > > > > > set > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > true, > > > > > > > > > > > > > > > > > > > > what > > > > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > > > > > > the next sequence ID be after > > the > > > > > > expired > > > > > > > > > > batch? > > > > > > > > > > > > > > Reusing > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > same > > > > > > > > > > > > > > > > > > > > > > > sequence > > > > > > > > > > > > > > > > > > > > > > > > Id may result in data loss, > and > > > > > > increment > > > > > > > > the > > > > > > > > > > > > > sequence > > > > > > > > > > > > > > ID > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > > cause > > > > > > > > > > > > > > > > > > > > > > > > OutOfOrderSequenceException. > > > > Besides, > > > > > > > > > > extracting > > > > > > > > > > > an > > > > > > > > > > > > > > > expired > > > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > > request also introduces some > > > > > > complexity. > > > > > > > > > Again, > > > > > > > > > > > > > > > personally > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > > > fine to expire a little bit > > late. > > > > So > > > > > > > maybe > > > > > > > > we > > > > > > > > > > > don't > > > > > > > > > > > > > > need > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > expire > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > > > > > > that is already in flight. In > > the > > > > > worst > > > > > > > > case > > > > > > > > > we > > > > > > > > > > > > will > > > > > > > > > > > > > > > expire > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > delay > > > > > > > > > > > > > > > > > > > > > > > > of request.timeout.ms. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 22, 2017 at 3:08 > > AM, > > > > > Ismael > > > > > > > > Juma > > > > > > > > > < > > > > > > > > > > > > > > > > > > ism...@juma.me.uk> > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi all, > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > >> The discussion has been > going > > on > > > > > for a > > > > > > > > > while, > > > > > > > > > > > > would > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > help > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > >> call to discuss this? I'd > like > > > to > > > > > > start > > > > > > > a > > > > > > > > > vote > > > > > > > > > > > > > soonish > > > > > > > > > > > > > > > so > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > > > > > >> include this in 1.0.0. I > > > > personally > > > > > > > prefer > > > > > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > >> It seems like Jun, Apurva > and > > > > Jason > > > > > > also > > > > > > > > > > prefer > > > > > > > > > > > > > that. > > > > > > > > > > > > > > > > > Sumant, > > > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > > > > seems > > > > > > > > > > > > > > > > > > > > > > > like > > > > > > > > > > > > > > > > > > > > > > > >> you still prefer a > > > > batch.expiry.ms, > > > > > > is > > > > > > > > that > > > > > > > > > > > > right? > > > > > > > > > > > > > > What > > > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > > > your > > > > > > > > > > > > > > > > > > > > > > > >> thoughts Joel and Becket? > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > >> Ismael > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > >> On Wed, Aug 16, 2017 at 6:34 > > PM, > > > > Jun > > > > > > > Rao < > > > > > > > > > > > > > > > > j...@confluent.io> > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> Hi, Sumant, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> The semantics of linger.ms > > is > > > a > > > > > bit > > > > > > > > > subtle. > > > > > > > > > > > The > > > > > > > > > > > > > > > > reasoning > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> current > > > > > > > > > > > > > > > > > > > > > > > >>> implementation is the > > > following. > > > > > > Let's > > > > > > > > say > > > > > > > > > > one > > > > > > > > > > > > sets > > > > > > > > > > > > > > > > > > linger.ms > > > > > > > > > > > > > > > > > > > > to 0 > > > > > > > > > > > > > > > > > > > > > > > (our > > > > > > > > > > > > > > > > > > > > > > > >>> current default value). > > > Creating > > > > a > > > > > > > batch > > > > > > > > > for > > > > > > > > > > > > every > > > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > bad > > > > > > > > > > > > > > > > > > > > > > > >>> for > > > > > > > > > > > > > > > > > > > > > > > >>> throughput. Instead, the > > > current > > > > > > > > > > implementation > > > > > > > > > > > > > only > > > > > > > > > > > > > > > > forms > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > > > >>> the > > > > > > > > > > > > > > > > > > > > > > > >>> batch is sendable (i.e., > > broker > > > > is > > > > > > > > > available, > > > > > > > > > > > > > > inflight > > > > > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > > > > > > > limit > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > > >>> not > > > > > > > > > > > > > > > > > > > > > > > >>> exceeded, etc). That way, > the > > > > > > producer > > > > > > > > has > > > > > > > > > > more > > > > > > > > > > > > > > chance > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > batching. > > > > > > > > > > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > > > > > > > >>> implication is that a batch > > > could > > > > > be > > > > > > > > closed > > > > > > > > > > > > longer > > > > > > > > > > > > > > than > > > > > > > > > > > > > > > > > > > > linger.ms. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> Now, on your concern about > > not > > > > > > having a > > > > > > > > > > precise > > > > > > > > > > > > way > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > control > > > > > > > > > > > > > > > > > > > > > delay > > > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > >>> the > > > > > > > > > > > > > > > > > > > > > > > >>> accumulator. It seems the > > > > > > > > batch.expiry.ms > > > > > > > > > > > > approach > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > same > > > > > > > > > > > > > > > > > > > > > > > >>> issue. If you start the > clock > > > > when > > > > > a > > > > > > > > batch > > > > > > > > > is > > > > > > > > > > > > > > > > initialized, > > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > > > > > > expire > > > > > > > > > > > > > > > > > > > > > > > >>> some messages in the same > > batch > > > > > early > > > > > > > > than > > > > > > > > > > > > > > > > batch.expiry.ms > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > If > > > > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > > > > > > > start > > > > > > > > > > > > > > > > > > > > > > > >>> the clock when the batch is > > > > closed, > > > > > > the > > > > > > > > > > > > expiration > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > >>> unbounded > > > > > > > > > > > > > > > > > > > > > > > >>> because of the linger.ms > > > > > > > implementation > > > > > > > > > > > > described > > > > > > > > > > > > > > > above. > > > > > > > > > > > > > > > > > > > > Starting > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> expiration clock on batch > > > > > > > initialization > > > > > > > > > will > > > > > > > > > > > at > > > > > > > > > > > > > > least > > > > > > > > > > > > > > > > > > > guarantee > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > > > > > >>> to expire the first message > > is > > > > > > precise, > > > > > > > > > which > > > > > > > > > > > is > > > > > > > > > > > > > > > probably > > > > > > > > > > > > > > > > > > good > > > > > > > > > > > > > > > > > > > > > > enough. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> Thanks, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> Jun > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> On Tue, Aug 15, 2017 at > 3:46 > > > PM, > > > > > > Sumant > > > > > > > > > > Tambe < > > > > > > > > > > > > > > > > > > > suta...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >>> > Question about "the > closing > > > of > > > > a > > > > > > > batch > > > > > > > > > can > > > > > > > > > > be > > > > > > > > > > > > > > delayed > > > > > > > > > > > > > > > > > > longer > > > > > > > > > > > > > > > > > > > > than > > > > > > > > > > > > > > > > > > > > > > > >>> > linger.ms": > > > > > > > > > > > > > > > > > > > > > > > >>> > Is it possible to cause > an > > > > > > indefinite > > > > > > > > > > delay? > > > > > > > > > > > At > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > point > > > > > > > > > > > > > > > > > > > > bytes > > > > > > > > > > > > > > > > > > > > > > > limit > > > > > > > > > > > > > > > > > > > > > > > >>> > might kick in. Also, why > is > > > > > closing > > > > > > > of > > > > > > > > a > > > > > > > > > > > batch > > > > > > > > > > > > > > > coupled > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > >>> availability of > > > > > > > > > > > > > > > > > > > > > > > >>> > its destination? In this > > > > > approach a > > > > > > > > batch > > > > > > > > > > > > chosen > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > eviction > > > > > > > > > > > > > > > > > > > > due > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > >>> delay > > > > > > > > > > > > > > > > > > > > > > > >>> > needs to "close" anyway, > > > right > > > > > > > (without > > > > > > > > > > > regards > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > destination > > > > > > > > > > > > > > > > > > > > > > > >>> > availability)? > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > I'm not too worried about > > > > > notifying > > > > > > > at > > > > > > > > > > > > > super-exact > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > > specified > > > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > >>> the > > > > > > > > > > > > > > > > > > > > > > > >>> > configs. But expiring > > before > > > > the > > > > > > full > > > > > > > > > > > wait-span > > > > > > > > > > > > > has > > > > > > > > > > > > > > > > > elapsed > > > > > > > > > > > > > > > > > > > > > sounds > > > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > >>> little > > > > > > > > > > > > > > > > > > > > > > > >>> > weird. So expiration time > > > has a > > > > > +/- > > > > > > > > > spread. > > > > > > > > > > > It > > > > > > > > > > > > > > works > > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > > > > > like > > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > hint > > > > > > > > > > > > > > > > > > > > > > > >>> than > > > > > > > > > > > > > > > > > > > > > > > >>> > max. So why not > > > > > > > > > > > message.delivery.wait.hint.ms? > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > Yeah, cancellable future > > will > > > > be > > > > > > > > similar > > > > > > > > > in > > > > > > > > > > > > > > > complexity. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > I'm unsure if > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > will > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > final > > > > > > > > > > > > > > > > > > > nail > > > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > > > >>> > producer > > > > > > > > > > > > > > > > > > > > > > > >>> > timeouts. We still won't > > > have a > > > > > > > precise > > > > > > > > > way > > > > > > > > > > > to > > > > > > > > > > > > > > > control > > > > > > > > > > > > > > > > > > delay > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > just > > > > > > > > > > > > > > > > > > > > > > > >>> the > > > > > > > > > > > > > > > > > > > > > > > >>> > accumulator segment. > > > > > > batch.expiry.ms > > > > > > > > > does > > > > > > > > > > > not > > > > > > > > > > > > > try > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > abstract. > > > > > > > > > > > > > > > > > > > > > > It's > > > > > > > > > > > > > > > > > > > > > > > >>> very > > > > > > > > > > > > > > > > > > > > > > > >>> > specific. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > My biggest concern at the > > > > moment > > > > > is > > > > > > > > > > > > > implementation > > > > > > > > > > > > > > > > > > > complexity. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > At this state, I would > like > > > to > > > > > > > > encourage > > > > > > > > > > > other > > > > > > > > > > > > > > > > > independent > > > > > > > > > > > > > > > > > > > > > > opinions. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > Regards, > > > > > > > > > > > > > > > > > > > > > > > >>> > Sumant > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > On 11 August 2017 at > 17:35, > > > Jun > > > > > > Rao < > > > > > > > > > > > > > > > j...@confluent.io> > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > Hi, Sumant, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > 1. Yes, it's probably > > > > > reasonable > > > > > > to > > > > > > > > > > require > > > > > > > > > > > > > > > > > > > > > > > >>> > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > linger.ms. As for > > retries, > > > > > > perhaps > > > > > > > > we > > > > > > > > > > can > > > > > > > > > > > > set > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > default > > > > > > > > > > > > > > > > > > > > > > retries > > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > >>> > > infinite or just ignore > > it. > > > > > Then > > > > > > > the > > > > > > > > > > > latency > > > > > > > > > > > > > will > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > bounded > > > > > > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > max.message.delivery.wait.ms > > > > . > > > > > > > > > > > > > request.timeout.ms > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > max > > > > > > > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > request will be > spending > > on > > > > the > > > > > > > > server. > > > > > > > > > > The > > > > > > > > > > > > > > client > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > expire > > > > > > > > > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > > > > > > >>> inflight > > > > > > > > > > > > > > > > > > > > > > > >>> > > request early if > needed. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > 2. Well, since > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > specifies > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > max, > > > > > > > > > > > > > > > > > > > > > > > >>> calling > > > > > > > > > > > > > > > > > > > > > > > >>> > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > callback a bit early > may > > be > > > > ok? > > > > > > > Note > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > > >>> > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > > > > >>> > > only > > > > > > > > > > > > > > > > > > > > > > > >>> > > comes into play in the > > rare > > > > > error > > > > > > > > case. > > > > > > > > > > > So, I > > > > > > > > > > > > > am > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > > > > need > > > > > > > > > > > > > > > > > > > > > > > >>> to > > > > > > > > > > > > > > > > > > > > > > > >>> > be > > > > > > > > > > > > > > > > > > > > > > > >>> > > very precise. The issue > > > with > > > > > > > starting > > > > > > > > > the > > > > > > > > > > > > clock > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > closing > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > > >>> > that > > > > > > > > > > > > > > > > > > > > > > > >>> > > currently if the leader > > is > > > > not > > > > > > > > > available, > > > > > > > > > > > the > > > > > > > > > > > > > > > closing > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > batch > > > > > > > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > > > > > >>> be > > > > > > > > > > > > > > > > > > > > > > > >>> > > delayed longer than > > > > linger.ms. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > 4. As you said, > > > > > > future.get(timeout) > > > > > > > > > > itself > > > > > > > > > > > > > > doesn't > > > > > > > > > > > > > > > > > solve > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > problem > > > > > > > > > > > > > > > > > > > > > > > >>> > since > > > > > > > > > > > > > > > > > > > > > > > >>> > > you still need a way to > > > > expire > > > > > > the > > > > > > > > > record > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > sender. > > > > > > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > > > > > > amount > > > > > > > > > > > > > > > > > > > > > > > >>> of > > > > > > > > > > > > > > > > > > > > > > > >>> > work > > > > > > > > > > > > > > > > > > > > > > > >>> > > to implement a > > cancellable > > > > > future > > > > > > > is > > > > > > > > > > > probably > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > same? > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > Overall, my concern > with > > > > patch > > > > > > work > > > > > > > > is > > > > > > > > > > that > > > > > > > > > > > > we > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > > > iterated > > > > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > produce > > > > > > > > > > > > > > > > > > > > > > > >>> > > request timeout > multiple > > > > times > > > > > > and > > > > > > > > new > > > > > > > > > > > issues > > > > > > > > > > > > > > keep > > > > > > > > > > > > > > > > > coming > > > > > > > > > > > > > > > > > > > > back. > > > > > > > > > > > > > > > > > > > > > > > >>> Ideally, > > > > > > > > > > > > > > > > > > > > > > > >>> > > this time, we want to > > have > > > a > > > > > > > solution > > > > > > > > > > that > > > > > > > > > > > > > covers > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > > cases, > > > > > > > > > > > > > > > > > > > > > even > > > > > > > > > > > > > > > > > > > > > > > >>> though > > > > > > > > > > > > > > > > > > > > > > > >>> > > that requires a bit > more > > > > work. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > Jun > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > On Fri, Aug 11, 2017 at > > > 12:30 > > > > > PM, > > > > > > > > > Sumant > > > > > > > > > > > > Tambe > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > suta...@gmail.com> > > > > > > > > > > > > > > > > > > > > > > > >>> > wrote: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Hi Jun, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Thanks for looking > into > > > it. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Yes, we did consider > > this > > > > > > > > > message-level > > > > > > > > > > > > > timeout > > > > > > > > > > > > > > > > > > approach > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > >>> expiring > > > > > > > > > > > > > > > > > > > > > > > >>> > > > batches selectively > in > > a > > > > > > request > > > > > > > > but > > > > > > > > > > > > rejected > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > due > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> reasons of > > > > > > > > > > > > > > > > > > > > > > > >>> > > > added complexity > > without > > > a > > > > > > strong > > > > > > > > > > benefit > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > counter-weigh > > > > > > > > > > > > > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > >>> Your > > > > > > > > > > > > > > > > > > > > > > > >>> > > > proposal is a slight > > > > > variation > > > > > > so > > > > > > > > > I'll > > > > > > > > > > > > > mention > > > > > > > > > > > > > > > some > > > > > > > > > > > > > > > > > > > issues > > > > > > > > > > > > > > > > > > > > > > here. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > 1. It sounds like > > > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > > overlap > > > > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > >>> "time > > > > > > > > > > > > > > > > > > > > > > > >>> > > > segments" of both > > > > linger.ms > > > > > > and > > > > > > > > > > retries > > > > > > > > > > > * > > > > > > > > > > > > ( > > > > > > > > > > > > > > > > > > > > > request.timeout.ms > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > >>> > > > retry.backoff.ms). > In > > > that > > > > > > case, > > > > > > > > > which > > > > > > > > > > > > > config > > > > > > > > > > > > > > > set > > > > > > > > > > > > > > > > > > takes > > > > > > > > > > > > > > > > > > > > > > > >>> precedence? It > > > > > > > > > > > > > > > > > > > > > > > >>> > > > would not make sense > to > > > > > > configure > > > > > > > > > > configs > > > > > > > > > > > > > from > > > > > > > > > > > > > > > both > > > > > > > > > > > > > > > > > > sets. > > > > > > > > > > > > > > > > > > > > > > > >>> Especially, > > > > > > > > > > > > > > > > > > > > > > > >>> > we > > > > > > > > > > > > > > > > > > > > > > > >>> > > > discussed > exhaustively > > > > > > internally > > > > > > > > > that > > > > > > > > > > > > > retries > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > max.message.delivery.wait.ms > > > > > > > > can't / > > > > > > > > > > > > > shouldn't > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > configured > > > > > > > > > > > > > > > > > > > > > > > >>> together. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Retires become moot > as > > > you > > > > > > > already > > > > > > > > > > > > mention. I > > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > > > > that's > > > > > > > > > > > > > > > > > > > > > > going > > > > > > > > > > > > > > > > > > > > > > > >>> to be > > > > > > > > > > > > > > > > > > > > > > > >>> > > > surprising to anyone > > > > wanting > > > > > to > > > > > > > use > > > > > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > >>> We > > > > > > > > > > > > > > > > > > > > > > > >>> > > > probably need > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > linger.ms > > > > > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > > > > > > > > >>> something > > > > > > > > > > > > > > > > > > > > > > > >>> > like > > > > > > > > > > > > > > > > > > > > > > > >>> > > > that. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > 2. If clock starts > > when a > > > > > batch > > > > > > > is > > > > > > > > > > > created > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > expire > > > > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > max.message.delivery.wait.ms > > > > > > is > > > > > > > > over > > > > > > > > > > in > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > accumulator, > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > last > > > > > > > > > > > > > > > > > > > > > > > >>> few > > > > > > > > > > > > > > > > > > > > > > > >>> > > > messages in the > > expiring > > > > > batch > > > > > > > may > > > > > > > > > not > > > > > > > > > > > have > > > > > > > > > > > > > > lived > > > > > > > > > > > > > > > > > long > > > > > > > > > > > > > > > > > > > > > enough. > > > > > > > > > > > > > > > > > > > > > > As > > > > > > > > > > > > > > > > > > > > > > > >>> the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > config seems to > > suggests > > > > > > > > per-message > > > > > > > > > > > > timeout, > > > > > > > > > > > > > > > it's > > > > > > > > > > > > > > > > > > > > incorrect > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > >>> expire > > > > > > > > > > > > > > > > > > > > > > > >>> > > > messages prematurely. > > On > > > > the > > > > > > > other > > > > > > > > > hand > > > > > > > > > > > if > > > > > > > > > > > > > > clock > > > > > > > > > > > > > > > > > starts > > > > > > > > > > > > > > > > > > > > after > > > > > > > > > > > > > > > > > > > > > > > >>> batch is > > > > > > > > > > > > > > > > > > > > > > > >>> > > > closed (which also > > > implies > > > > > that > > > > > > > > > > > linger.ms > > > > > > > > > > > > is > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > > covered > > > > > > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > max.message.delivery.wait.ms > > > > > > > > > config), > > > > > > > > > > no > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > >>> expired > > > > > > > > > > > > > > > > > > > > > > > >>> > too > > > > > > > > > > > > > > > > > > > > > > > >>> > > > soon. Yeah, > expiration > > > may > > > > be > > > > > > > > little > > > > > > > > > > bit > > > > > > > > > > > > too > > > > > > > > > > > > > > late > > > > > > > > > > > > > > > > but > > > > > > > > > > > > > > > > > > > hey, > > > > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > > > > >>> ain't > > > > > > > > > > > > > > > > > > > > > > > >>> > > > real-time service. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > 3. I agree that steps > > #3, > > > > #4, > > > > > > > (and > > > > > > > > > #5) > > > > > > > > > > > are > > > > > > > > > > > > > > > complex > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > implement. > > > > > > > > > > > > > > > > > > > > > > > >>> On the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > other hand, > > > > batch.expiry.ms > > > > > is > > > > > > > > next > > > > > > > > > to > > > > > > > > > > > > > trivial > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > implement. > > > > > > > > > > > > > > > > > > > > > > We > > > > > > > > > > > > > > > > > > > > > > > >>> just > > > > > > > > > > > > > > > > > > > > > > > >>> > > pass > > > > > > > > > > > > > > > > > > > > > > > >>> > > > the config all the > way > > > down > > > > > to > > > > > > > > > > > > > > > > > > ProducerBatch.maybeExpire > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > >>> done > > > > > > > > > > > > > > > > > > > > > > > >>> > with > > > > > > > > > > > > > > > > > > > > > > > >>> > > > it. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > 4. Do you think the > > > effect > > > > of > > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms > > > > > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > >>> > > > simulated > > > > > > > > > > > > > > > > > > > > > > > >>> > > > with > > future.get(timeout) > > > > > > method? > > > > > > > > > > Copying > > > > > > > > > > > > > > excerpt > > > > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > kip-91: > > > > > > > > > > > > > > > > > > > > > > > >>> An > > > > > > > > > > > > > > > > > > > > > > > >>> > > > end-to-end timeout > may > > be > > > > > > > partially > > > > > > > > > > > > emulated > > > > > > > > > > > > > > > using > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > future.get(timeout). > > > > > > > > > > > > > > > > > > > > > > > >>> > > > The timeout must be > > > greater > > > > > > than > > > > > > > ( > > > > > > > > > > > > > > > batch.expiry.ms > > > > > > > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > nRetries > > > > > > > > > > > > > > > > > > > > > > * ( > > > > > > > > > > > > > > > > > > > > > > > >>> > > > request.timeout.ms + > > > > > > > > > retry.backoff.ms > > > > > > > > > > )). > > > > > > > > > > > > > Note > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > future > > > > > > > > > > > > > > > > > > > > > > > >>> times > > > > > > > > > > > > > > > > > > > > > > > >>> > > out, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Sender may continue > to > > > send > > > > > the > > > > > > > > > records > > > > > > > > > > > in > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > background. > > > > > > > > > > > > > > > > > > > > To > > > > > > > > > > > > > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > > > > > > > > >>> > that, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > implementing a > > > cancellable > > > > > > future > > > > > > > > is > > > > > > > > > a > > > > > > > > > > > > > > > possibility. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > For simplicity, we > > could > > > > just > > > > > > > > > > implement a > > > > > > > > > > > > > > trivial > > > > > > > > > > > > > > > > > > method > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > >>> producer > > > > > > > > > > > > > > > > > > > > > > > >>> > > > ProducerConfigs. > > > > > > > > > > > maxMessageDeliveryWaitMs() > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > return > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > number > > > > > > > > > > > > > > > > > > > > > > > >>> based > > > > > > > > > > > > > > > > > > > > > > > >>> > on > > > > > > > > > > > > > > > > > > > > > > > >>> > > > this formula? Users > of > > > > > > future.get > > > > > > > > can > > > > > > > > > > use > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > timeout > > > > > > > > > > > > > > > > > > > > value. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Regards, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > Sumant > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > On 11 August 2017 at > > > 07:50, > > > > > > > Sumant > > > > > > > > > > Tambe > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > suta...@gmail.com> > > > > > > > > > > > > > > > > > > > > > > > >>> wrote: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > Thanks for the KIP. > > > Nice > > > > > > > > > > documentation > > > > > > > > > > > on > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > current > > > > > > > > > > > > > > > > > > > > > issues > > > > > > > > > > > > > > > > > > > > > > > >>> with the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> timeout. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > For the KIP > writeup, > > > all > > > > > > credit > > > > > > > > > goes > > > > > > > > > > to > > > > > > > > > > > > > Joel > > > > > > > > > > > > > > > > Koshy. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > I'll follow up on > > your > > > > > > > comments a > > > > > > > > > > > little > > > > > > > > > > > > > > later. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> You also brought > up > > a > > > > good > > > > > > use > > > > > > > > > case > > > > > > > > > > > for > > > > > > > > > > > > > > timing > > > > > > > > > > > > > > > > > out a > > > > > > > > > > > > > > > > > > > > > > message. > > > > > > > > > > > > > > > > > > > > > > > >>> For > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> applications that > > > > collect > > > > > > and > > > > > > > > send > > > > > > > > > > > > sensor > > > > > > > > > > > > > > data > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > Kafka, > > > > > > > > > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> data > > > > > > > > > > > > > > > > > > > > > > > >>> > > > can't > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> be sent to Kafka > for > > > > some > > > > > > > > reason, > > > > > > > > > > the > > > > > > > > > > > > > > > > application > > > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > > > > prefer > > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > >>> > buffer > > > > > > > > > > > > > > > > > > > > > > > >>> > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> more recent data > in > > > the > > > > > > > > > accumulator. > > > > > > > > > > > > > > Without a > > > > > > > > > > > > > > > > > > > timeout, > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > accumulator > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> will be filled > with > > > old > > > > > > > records > > > > > > > > > and > > > > > > > > > > > new > > > > > > > > > > > > > > > records > > > > > > > > > > > > > > > > > > can't > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > added. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Your proposal > makes > > > > sense > > > > > > for > > > > > > > a > > > > > > > > > > > > developer > > > > > > > > > > > > > > who > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > familiar > > > > > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > > > > > > >>> how > > > > > > > > > > > > > > > > > > > > > > > >>> > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> producer works. I > am > > > not > > > > > > sure > > > > > > > if > > > > > > > > > > this > > > > > > > > > > > is > > > > > > > > > > > > > > very > > > > > > > > > > > > > > > > > > > intuitive > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> users > > > > > > > > > > > > > > > > > > > > > > > >>> > > > since > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> it may not be very > > > easy > > > > > for > > > > > > > them > > > > > > > > > to > > > > > > > > > > > > figure > > > > > > > > > > > > > > out > > > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > configure > > > > > > > > > > > > > > > > > > > > > > > >>> the > > > > > > > > > > > > > > > > > > > > > > > >>> > > new > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> knob to bound the > > > amount > > > > > of > > > > > > > the > > > > > > > > > time > > > > > > > > > > > > when > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > > > > >>> completed. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> From users' > > > perspective, > > > > > > > > Apurva's > > > > > > > > > > > > > suggestion > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > max.message.delivery.wait.ms > > > > > > > > > (which > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> bounds the time > > when a > > > > > > message > > > > > > > > is > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > > > > > accumulator > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > > > > > >>> > when > > > > > > > > > > > > > > > > > > > > > > > >>> > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> callback is > called) > > > > seems > > > > > > more > > > > > > > > > > > > intuition. > > > > > > > > > > > > > > You > > > > > > > > > > > > > > > > > listed > > > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > rejected > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> section since it > > > > requires > > > > > > > > > additional > > > > > > > > > > > > logic > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > rebatch > > > > > > > > > > > > > > > > > > > > > when a > > > > > > > > > > > > > > > > > > > > > > > >>> produce > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> request expires. > > > > However, > > > > > > this > > > > > > > > may > > > > > > > > > > not > > > > > > > > > > > > be > > > > > > > > > > > > > > too > > > > > > > > > > > > > > > > bad. > > > > > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > > > > > > > >>> following are > > > > > > > > > > > > > > > > > > > > > > > >>> > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> things that we > have > > to > > > > do. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 1. The clock > starts > > > > when a > > > > > > > batch > > > > > > > > > is > > > > > > > > > > > > > created. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 2. If the batch > > can't > > > be > > > > > > > drained > > > > > > > > > > > within > > > > > > > > > > > > > > > > > > > > > > > >>> > > > max.message.delivery.wait.ms > > > , > > > > > > > > > > > > > > > > > > > > > > > >>> > > > all > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> messages in the > > batch > > > > will > > > > > > > fail > > > > > > > > > and > > > > > > > > > > > the > > > > > > > > > > > > > > > callback > > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > > > > > > > > called. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 3. When sending a > > > > produce > > > > > > > > request, > > > > > > > > > > we > > > > > > > > > > > > > > > calculate > > > > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > > > > > expireTime > > > > > > > > > > > > > > > > > > > > > > > >>> for > > > > > > > > > > > > > > > > > > > > > > > >>> > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> request that > equals > > to > > > > the > > > > > > > > > remaining > > > > > > > > > > > > > > > expiration > > > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> oldest > > > > > > > > > > > > > > > > > > > > > > > >>> > > > batch > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> in the request. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 4. We set the > > minimum > > > of > > > > > the > > > > > > > > > > > expireTime > > > > > > > > > > > > of > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > > inflight > > > > > > > > > > > > > > > > > > > > > > > >>> requests as > > > > > > > > > > > > > > > > > > > > > > > >>> > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> timeout in the > > > selector > > > > > poll > > > > > > > > call > > > > > > > > > > (so > > > > > > > > > > > > that > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > selector > > > > > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > > > > > > > >>> wake up > > > > > > > > > > > > > > > > > > > > > > > >>> > > > before > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> the expiration > > time). > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 5. If the produce > > > > response > > > > > > > can't > > > > > > > > > be > > > > > > > > > > > > > received > > > > > > > > > > > > > > > > > within > > > > > > > > > > > > > > > > > > > > > > > expireTime, > > > > > > > > > > > > > > > > > > > > > > > >>> we > > > > > > > > > > > > > > > > > > > > > > > >>> > > > expire > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> all batches in the > > > > produce > > > > > > > > request > > > > > > > > > > > whose > > > > > > > > > > > > > > > > > expiration > > > > > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > > > has > > > > > > > > > > > > > > > > > > > > > > > >>> been > > > > > > > > > > > > > > > > > > > > > > > >>> > > > reached. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> For the rest of > the > > > > > batches, > > > > > > > we > > > > > > > > > > resend > > > > > > > > > > > > > them > > > > > > > > > > > > > > > in a > > > > > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > > > > > > produce > > > > > > > > > > > > > > > > > > > > > > > >>> > request. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 6. If the producer > > > > > response > > > > > > > has > > > > > > > > a > > > > > > > > > > > > > retriable > > > > > > > > > > > > > > > > error, > > > > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > > > > > just > > > > > > > > > > > > > > > > > > > > > > > >>> backoff a > > > > > > > > > > > > > > > > > > > > > > > >>> > > bit > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> and then retry the > > > > produce > > > > > > > > request > > > > > > > > > > as > > > > > > > > > > > > > today. > > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > > > number > > > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > > > >>> retries > > > > > > > > > > > > > > > > > > > > > > > >>> > > > doesn't > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> really matter now. > > We > > > > just > > > > > > > keep > > > > > > > > > > > retrying > > > > > > > > > > > > > > until > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > expiration > > > > > > > > > > > > > > > > > > > > > > > >>> time > > > > > > > > > > > > > > > > > > > > > > > >>> > is > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> reached. It's > > possible > > > > > that > > > > > > a > > > > > > > > > > produce > > > > > > > > > > > > > > request > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > never > > > > > > > > > > > > > > > > > > > > > > retried > > > > > > > > > > > > > > > > > > > > > > > >>> due > > > > > > > > > > > > > > > > > > > > > > > >>> > to > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> expiration. > However, > > > > this > > > > > > > seems > > > > > > > > > the > > > > > > > > > > > > right > > > > > > > > > > > > > > > thing > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > > > > > > since > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > users > > > > > > > > > > > > > > > > > > > > > > > >>> > > > want > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> to timeout the > > message > > > > at > > > > > > this > > > > > > > > > time. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Implementation > wise, > > > > there > > > > > > > will > > > > > > > > > be a > > > > > > > > > > > bit > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > > > > > complexity > > > > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > >>> step 3 > > > > > > > > > > > > > > > > > > > > > > > >>> > and > > > > > > > > > > > > > > > > > > > > > > > >>> > > > 4, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> but probably not > too > > > > bad. > > > > > > The > > > > > > > > > > benefit > > > > > > > > > > > is > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > > > > > > > > > >>> intuitive > > > > > > > > > > > > > > > > > > > > > > > >>> > > to > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> end user. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Does that sound > > > > reasonable > > > > > > to > > > > > > > > you? > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Thanks, > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Jun > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> On Wed, Aug 9, > 2017 > > at > > > > > 10:03 > > > > > > > PM, > > > > > > > > > > > Sumant > > > > > > > > > > > > > > Tambe > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > >>> suta...@gmail.com> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > On Wed, Aug 9, > > 2017 > > > at > > > > > > 1:28 > > > > > > > PM > > > > > > > > > > > Apurva > > > > > > > > > > > > > > Mehta > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > >>> apu...@confluent.io> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> wrote: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > There > seems > > to > > > > be > > > > > no > > > > > > > > > > > > relationship > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > cluster > > > > > > > > > > > > > > > > > > > > > > > metadata > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> availability > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > or > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > staleness. > > > > Expiry > > > > > is > > > > > > > > just > > > > > > > > > > > based > > > > > > > > > > > > on > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > time > > > > > > > > > > > > > > > > > > > > since > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> batch > > > > > > > > > > > > > > > > > > > > > > > >>> > > has > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> been > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > ready. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > Please > > correct > > > > me > > > > > > if I > > > > > > > > am > > > > > > > > > > > wrong. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > I was not > very > > > > > > specific > > > > > > > > > about > > > > > > > > > > > > where > > > > > > > > > > > > > we > > > > > > > > > > > > > > > do > > > > > > > > > > > > > > > > > > > > > expiration. > > > > > > > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > > > > > > > >>> > glossed > > > > > > > > > > > > > > > > > > > > > > > >>> > > > over > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > some > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > details > > because > > > > > > (again) > > > > > > > > > we've > > > > > > > > > > > > other > > > > > > > > > > > > > > > > > mechanisms > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > detect > > > > > > > > > > > > > > > > > > > > > > > >>> non > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> progress. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > The > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > condition > > > > > > > > > (!muted.contains(tp) > > > > > > > > > > > && > > > > > > > > > > > > > > > > > > > (isMetadataStale > > > > > > > > > > > > > > > > > > > > > || > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > cluster.leaderFor(tp) > > > > > > > == > > > > > > > > > > > null)) > > > > > > > > > > > > is > > > > > > > > > > > > > > > used > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > RecordAccumualtor. > > > > > > > > > > > expiredBatches: > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > https://github.com/apache/ > > > > > > > > > > > > > > > > > > > > > > kafka/blob/trunk/clients/src/ > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > main/java/org/apache/kafka/ > > > > > > > > > > > > > > > > > > > > > > clients/producer/internals/ > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > RecordAccumulator.java#L443 > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > Effectively, > > we > > > > > expire > > > > > > > in > > > > > > > > > all > > > > > > > > > > > the > > > > > > > > > > > > > > > > following > > > > > > > > > > > > > > > > > > > cases > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 1) producer > is > > > > > > > partitioned > > > > > > > > > > from > > > > > > > > > > > > the > > > > > > > > > > > > > > > > brokers. > > > > > > > > > > > > > > > > > > > When > > > > > > > > > > > > > > > > > > > > > > > >>> metadata age > > > > > > > > > > > > > > > > > > > > > > > >>> > > > grows > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > beyond > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 3x it's max > > > value. > > > > > > It's > > > > > > > > safe > > > > > > > > > > to > > > > > > > > > > > > say > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > we're > > > > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > > > > > > > >>> talking to > > > > > > > > > > > > > > > > > > > > > > > >>> > > the > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > brokers > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > at all. > > Report. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 2) fresh > > > metadata > > > > && > > > > > > > > leader > > > > > > > > > > for > > > > > > > > > > > a > > > > > > > > > > > > > > > > partition > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > > > > > > > known > > > > > > > > > > > > > > > > > > > > > > > >>> && a > > > > > > > > > > > > > > > > > > > > > > > >>> > > > batch > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> is > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > sitting > there > > > for > > > > > > longer > > > > > > > > > than > > > > > > > > > > > > > > > > > > > request.timeout.ms. > > > > > > > > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > > > > > > > >>> is one > > > > > > > > > > > > > > > > > > > > > > > >>> > > > case > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> we > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > would > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > like to > > improve > > > > and > > > > > > use > > > > > > > > > > > > > > batch.expiry.ms > > > > > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > > > > > > > > > >>> > > > request.timeout.ms > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> is > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > too > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > small. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 3) fresh > > > metadata > > > > && > > > > > > > > leader > > > > > > > > > > for > > > > > > > > > > > a > > > > > > > > > > > > > > > > partition > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > > known > > > > > > > > > > > > > > > > > > > > > > && > > > > > > > > > > > > > > > > > > > > > > > >>> batch > > > > > > > > > > > > > > > > > > > > > > > >>> > is > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > sitting > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > there for > > longer > > > > > than > > > > > > > > > > > > > batch.expiry.ms > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > > > > > > > > case > > > > > > > > > > > > > > > > > > > > > > > >>> > that > > > > > > > > > > > > > > > > > > > > > > > >>> > > is > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > different > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > from #2. > This > > is > > > > the > > > > > > > > > catch-up > > > > > > > > > > > mode > > > > > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > Things > > > > > > > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > > > > > > > >>> moving too > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> slowly. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > Pipeline > SLAs > > > are > > > > > > > broken. > > > > > > > > > > Report > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > shutdown > > > > > > > > > > > > > > > > > > > kmm. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > The second > and > > > the > > > > > > third > > > > > > > > > cases > > > > > > > > > > > are > > > > > > > > > > > > > > > useful > > > > > > > > > > > > > > > > > to a > > > > > > > > > > > > > > > > > > > > > > real-time > > > > > > > > > > > > > > > > > > > > > > > >>> app > > > > > > > > > > > > > > > > > > > > > > > >>> > > for a > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > completely > > > > different > > > > > > > > reason. > > > > > > > > > > > > Report, > > > > > > > > > > > > > > > > forget > > > > > > > > > > > > > > > > > > > about > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> batch, > > > > > > > > > > > > > > > > > > > > > > > >>> > and > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> just > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > move > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > on (without > > > > shutting > > > > > > > > down). > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > If I > understand > > > > > > correctly, > > > > > > > > you > > > > > > > > > > are > > > > > > > > > > > > > > talking > > > > > > > > > > > > > > > > > > about a > > > > > > > > > > > > > > > > > > > > > fork > > > > > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > > > >>> > apache > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> kafka > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > which has > these > > > > > > additional > > > > > > > > > > > > conditions? > > > > > > > > > > > > > > > > Because > > > > > > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > > > > > > > > check > > > > > > > > > > > > > > > > > > > > > > > >>> > doesn't > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> exist > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > on > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > trunk today. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > Right. It is our > > > > > internal > > > > > > > > > release > > > > > > > > > > in > > > > > > > > > > > > > > > LinkedIn. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > Or are you > > proposing > > > > to > > > > > > > change > > > > > > > > > the > > > > > > > > > > > > > > behavior > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > expiry > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > account for > > stale > > > > > > metadata > > > > > > > > and > > > > > > > > > > > > > > partitioned > > > > > > > > > > > > > > > > > > > producers > > > > > > > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > > > > > > > >>> part of > > > > > > > > > > > > > > > > > > > > > > > >>> > > this > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> KIP? > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > No. It's our > > > temporary > > > > > > > > solution > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > > > absence > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > kip-91. > > > > > > > > > > > > > > > > > > > > > > > Note > > > > > > > > > > > > > > > > > > > > > > > >>> > that > > > > > > > > > > > > > > > > > > > > > > > >>> > > we > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> dont > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > like increasing > > > > > > > > > > request.timeout.ms. > > > > > > > > > > > > > > Without > > > > > > > > > > > > > > > > our > > > > > > > > > > > > > > > > > > > extra > > > > > > > > > > > > > > > > > > > > > > > >>> conditions > > > > > > > > > > > > > > > > > > > > > > > >>> > > our > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > batches expire > too > > > > > soon--a > > > > > > > > > problem > > > > > > > > > > > in > > > > > > > > > > > > > kmm > > > > > > > > > > > > > > > > > catchup > > > > > > > > > > > > > > > > > > > > mode. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > If we get > > > > > batch.expiry.ms > > > > > > , > > > > > > > we > > > > > > > > > > will > > > > > > > > > > > > > > > configure > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > 20 > > > > > > > > > > > > > > > > > > > > > > mins. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > maybeExpire > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > will use the > > config > > > > > > instead > > > > > > > of > > > > > > > > > > > r.t.ms > > > > > > > > > > > > . > > > > > > > > > > > > > > The > > > > > > > > > > > > > > > > > extra > > > > > > > > > > > > > > > > > > > > > > conditions > > > > > > > > > > > > > > > > > > > > > > > >>> will > > > > > > > > > > > > > > > > > > > > > > > >>> > be > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > unnecessary. All > > > three > > > > > > cases > > > > > > > > > shall > > > > > > > > > > > be > > > > > > > > > > > > > > > covered > > > > > > > > > > > > > > > > > via > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > > > > > > >>> batch.expiry > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> timeout. > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >