Just did :)

On 7 September 2017 at 17:52, Ismael Juma <ism...@juma.me.uk> wrote:

> Can we please start the vote on this KIP? The KIP must be accepted by next
> Wednesday in order to make the cut for 1.0.0. This issue keeps coming up
> again and again, and I'd really like to include a fix for 1.0.0.
>
> Ismael
>
> On Thu, Sep 7, 2017 at 10:01 PM, Apurva Mehta <apu...@confluent.io> wrote:
>
> > I agree with what Ismael said. Having both retries and
> delivery.timeout.ms
> > is confusing, and thus the goal is to not have a retries option at all
> once
> > idempotence is fully battle tested and has become the entrenched default.
> >
> > Until that time, it makes sense to expire batch earlier than
> > delivery.timeout.ms if retries have been exhausted.
> >
> > Thanks,
> > Apurva
> >
> >
> > On Thu, Sep 7, 2017 at 6:07 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Good question regarding retries Sumant. A few comments:
> > >
> > > 1. Defaulting to MAX_INT makes sense in the context of
> > delivery.timeout.ms
> > > ,
> > > but introduces the possibility of reordering with the default
> > max.in.flight
> > > of 5. Personally, I think reordering is better than dropping the
> message
> > > altogether (if we keep retries=0), but it's worth noting this.
> > >
> > > 2. I agree that we should expire on whichever of retries and
> > > delivery.timeout.ms is exhausted first for 1.0.0.
> > >
> > > 3. Once KIP-185 lands (post 1.0.0), we should consider deprecating and
> > > eventually removing the retries config to simplify things (it won't
> have
> > > much use then).
> > >
> > > 4. With regards to the case where the broker replies quickly with an
> > error,
> > > we need to understand a bit more what the error is. For any kind of
> > > connection issue, we now have exponential backoff. For the case where
> an
> > > error code is returned, it depends on whether the error is retriable or
> > > not. For the former, it probably makes sense to keep retrying as it's
> > > supposed to be a transient issue. If we think it would make sense to
> > apply
> > > exponential backoff, we could also consider that. So, I'm not sure
> > retries
> > > has much use apart from compatibility and the retries=0 case (for now).
> > >
> > > Ismael
> > >
> > > On Wed, Sep 6, 2017 at 11:14 PM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Sumant,
> > > >
> > > > The diagram in the wiki seems to imply that delivery.timeout.ms
> > doesn't
> > > > include the batching time.
> > > >
> > > > For retries, probably we can just default it to MAX_INT?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Sep 6, 2017 at 10:26 AM, Sumant Tambe <suta...@gmail.com>
> > wrote:
> > > >
> > > > > 120 seconds default sounds good to me. Throwing ConfigException
> > instead
> > > > of
> > > > > WARN is fine. Added clarification that the producer waits the full
> > > > > request.timeout.ms for the in-flight request. This implies that
> user
> > > > might
> > > > > be notified of batch expiry while a batch is still in-flight.
> > > > >
> > > > > I don't recall if we discussed our point of view that existing
> > configs
> > > > like
> > > > > retries become redundant/deprecated with this feature. IMO, retries
> > > > config
> > > > > becomes meaningless due to the possibility of incorrect configs
> like
> > > > > delivery.timeout.ms > linger.ms + retries * (request..timeout.ms +
> > > > > retry.backoff.ms), retries should be basically interpreted as
> > MAX_INT?
> > > > > What
> > > > > will be the default?
> > > > >
> > > > > So do we ignore retries config or throw a ConfigException if
> > weirdness
> > > > like
> > > > > above is detected?
> > > > >
> > > > > -Sumant
> > > > >
> > > > >
> > > > > On 5 September 2017 at 17:34, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > > > >
> > > > > > Thanks for updating the KIP, Sumant. A couple of points:
> > > > > >
> > > > > > 1. I think the default for delivery.timeout.ms should be higher
> > than
> > > > 30
> > > > > > seconds given that we previously would reset the clock once the
> > batch
> > > > was
> > > > > > sent. The value should be large enough that batches are not
> expired
> > > due
> > > > > to
> > > > > > expected events like a new leader being elected due to broker
> > > failure.
> > > > > > Would it make sense to use a conservative value like 120 seconds?
> > > > > >
> > > > > > 2. The producer currently throws an exception for configuration
> > > > > > combinations that don't make sense. We should probably do the
> same
> > > here
> > > > > for
> > > > > > consistency (the KIP currently proposes a log warning).
> > > > > >
> > > > > > 3. We should mention that we will not cancel in flight requests
> > until
> > > > the
> > > > > > request timeout even though we'll expire the batch early if
> needed.
> > > > > >
> > > > > > I think we should start the vote tomorrow so that we have a
> chance
> > of
> > > > > > hitting the KIP freeze for 1.0.0.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, Sep 6, 2017 at 1:03 AM, Sumant Tambe <suta...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I've updated the kip-91 writeup
> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 91+Provide+Intuitive+User+Timeouts+in+The+Producer>
> > > > > > > to capture some of the discussion here. Please confirm if it's
> > > > > > sufficiently
> > > > > > > accurate. Feel free to edit it if you think some explanation
> can
> > be
> > > > > > better
> > > > > > > and has been agreed upon here.
> > > > > > >
> > > > > > > How do you proceed from here?
> > > > > > >
> > > > > > > -Sumant
> > > > > > >
> > > > > > > On 30 August 2017 at 12:59, Jun Rao <j...@confluent.io> wrote:
> > > > > > >
> > > > > > > > Hi, Jiangjie,
> > > > > > > >
> > > > > > > > I mis-understood Jason's approach earlier. It does seem to
> be a
> > > > good
> > > > > > one.
> > > > > > > > We still need to calculate the selector timeout based on the
> > > > > remaining
> > > > > > > > delivery.timeout.ms to call the callback on time, but we can
> > > > always
> > > > > > wait
> > > > > > > > for an inflight request based on request.timeout.ms.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > > On Tue, Aug 29, 2017 at 5:16 PM, Becket Qin <
> > > becket....@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Yeah, I think expiring a batch but still wait for the
> > response
> > > is
> > > > > > > > probably
> > > > > > > > > reasonable given the result is not guaranteed anyways.
> > > > > > > > >
> > > > > > > > > @Jun,
> > > > > > > > >
> > > > > > > > > I think the frequent PID reset may still be possible if we
> do
> > > not
> > > > > > wait
> > > > > > > > for
> > > > > > > > > the in-flight response to return. Consider two partitions
> p0
> > > and
> > > > > p1,
> > > > > > > the
> > > > > > > > > deadline of the batches for p0 are T + 10, T + 30, T +
> 50...
> > > The
> > > > > > > deadline
> > > > > > > > > of the batches for p1 are T + 20, T + 40, T + 60...
> Assuming
> > > each
> > > > > > > request
> > > > > > > > > takes more than 10 ms to get the response. The following
> > > sequence
> > > > > may
> > > > > > > be
> > > > > > > > > possible:
> > > > > > > > >
> > > > > > > > > T: PID0 send batch0_p0(PID0), batch0_p1(PID0)
> > > > > > > > > T + 10: PID0 expires batch0_p0(PID0), without resetting
> PID,
> > > > sends
> > > > > > > > > batch1_p0(PID0) and batch0_p1(PID0, retry)
> > > > > > > > > T + 20: PID0 expires batch0_p1(PID0, retry), resets the PID
> > to
> > > > > PID1,
> > > > > > > > sends
> > > > > > > > > batch1_p0(PID0, retry) and batch1_p1(PID1)
> > > > > > > > > T + 30: PID1 expires batch1_p0(PID0, retry), without
> > resetting
> > > > PID,
> > > > > > > sends
> > > > > > > > > batch2_p0(PID1) and batch1_p1(PID1, retry)
> > > > > > > > > T + 40: PID1 expires batch1_p1(PID1, retry), resets the PID
> > to
> > > > > PID2,
> > > > > > > > sends
> > > > > > > > > batch2_p0(PID1, retry) and sends batch2_p1(PID2)
> > > > > > > > > ....
> > > > > > > > >
> > > > > > > > > In the above example, the producer will reset PID once
> every
> > > two
> > > > > > > > requests.
> > > > > > > > > The example did not take retry backoff into consideration,
> > but
> > > it
> > > > > > still
> > > > > > > > > seems possible to encounter frequent PID reset if we do not
> > > wait
> > > > > for
> > > > > > > the
> > > > > > > > > request to finish. Also, in this case we will have a lot of
> > > > retries
> > > > > > and
> > > > > > > > > mixture of PIDs which seem to be pretty complicated.
> > > > > > > > >
> > > > > > > > > I think Jason's suggestion will address both concerns, i.e.
> > we
> > > > fire
> > > > > > the
> > > > > > > > > callback at exactly delivery.timeout.ms, but we will still
> > > wait
> > > > > for
> > > > > > > the
> > > > > > > > > response to be returned before sending the next request.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Aug 29, 2017 at 4:00 PM, Jun Rao <j...@confluent.io
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hmm, I thought delivery.timeout.ms bounds the time from
> a
> > > > > message
> > > > > > is
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > accumulator (i.e., when send() returns) to the time when
> > the
> > > > > > callback
> > > > > > > > is
> > > > > > > > > > called. If we wait for request.timeout.ms for an
> inflight
> > > > > request
> > > > > > > and
> > > > > > > > > the
> > > > > > > > > > remaining delivery.timeout.ms is less than
> > > request.timeout.ms,
> > > > > the
> > > > > > > > > > callback
> > > > > > > > > > may be called later than delivery.timeout.ms, right?
> > > > > > > > > >
> > > > > > > > > > Jiangjie's concern on resetting the pid on every expired
> > > batch
> > > > is
> > > > > > > > > probably
> > > > > > > > > > not an issue if we only reset the pid when the expired
> > > batch's
> > > > > pid
> > > > > > is
> > > > > > > > the
> > > > > > > > > > same as the current pid, as Jason suggested.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 29, 2017 at 3:09 PM, Jason Gustafson <
> > > > > > ja...@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I think the semantics of delivery.timeout.ms need to
> > allow
> > > > for
> > > > > > the
> > > > > > > > > > > possibility that the record was actually written.
> Unless
> > we
> > > > can
> > > > > > > keep
> > > > > > > > on
> > > > > > > > > > > retrying indefinitely, there's really no way to know
> for
> > > sure
> > > > > > > whether
> > > > > > > > > the
> > > > > > > > > > > record was written or not. A delivery timeout just
> means
> > > that
> > > > > we
> > > > > > > > cannot
> > > > > > > > > > > guarantee that the record was delivered.
> > > > > > > > > > >
> > > > > > > > > > > -Jason
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 29, 2017 at 2:51 PM, Becket Qin <
> > > > > > becket....@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Jason,
> > > > > > > > > > > >
> > > > > > > > > > > > If we expire the batch from user's perspective but
> > still
> > > > > > waiting
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > response, would that mean it is likely that the batch
> > > will
> > > > be
> > > > > > > > > > > successfully
> > > > > > > > > > > > appended but the users will receive a
> TimeoutException?
> > > > That
> > > > > > > seems
> > > > > > > > a
> > > > > > > > > > > little
> > > > > > > > > > > > non-intuitive to the users. Arguably it maybe OK
> though
> > > > > because
> > > > > > > > > > currently
> > > > > > > > > > > > when TimeoutException is thrown, there is no
> guarantee
> > > > > whether
> > > > > > > the
> > > > > > > > > > > messages
> > > > > > > > > > > > are delivered or not.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Aug 29, 2017 at 12:33 PM, Jason Gustafson <
> > > > > > > > > ja...@confluent.io>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I think I'm with Becket. We should wait for
> > > > > > request.timeout.ms
> > > > > > > > for
> > > > > > > > > > > each
> > > > > > > > > > > > > produce request we send. We can still await the
> > > response
> > > > > > > > internally
> > > > > > > > > > for
> > > > > > > > > > > > > PID/sequence maintenance even if we expire the
> batch
> > > from
> > > > > the
> > > > > > > > > user's
> > > > > > > > > > > > > perspective. New sequence numbers would be assigned
> > > based
> > > > > on
> > > > > > > the
> > > > > > > > > > > current
> > > > > > > > > > > > > PID until the response returns and we find whether
> a
> > > PID
> > > > > > reset
> > > > > > > is
> > > > > > > > > > > > actually
> > > > > > > > > > > > > needed. This makes delivery.timeout.ms a hard
> limit
> > > > which
> > > > > is
> > > > > > > > > easier
> > > > > > > > > > to
> > > > > > > > > > > > > explain.
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Jason
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Aug 29, 2017 at 11:10 AM, Sumant Tambe <
> > > > > > > > suta...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm updating the kip-91 writeup. There seems to
> be
> > > some
> > > > > > > > confusion
> > > > > > > > > > > about
> > > > > > > > > > > > > > expiring an inflight request. An inflight request
> > > gets
> > > > a
> > > > > > full
> > > > > > > > > > > > > > delivery.timeout.ms duration from creation,
> right?
> > > So
> > > > it
> > > > > > > > should
> > > > > > > > > be
> > > > > > > > > > > > > > max(remaining delivery.timeout.ms,
> > > request.timeout.ms
> > > > )?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun, we do want to wait for an inflight request
> for
> > > > > longer
> > > > > > > than
> > > > > > > > > > > > > > request.timeout.ms. right?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What happens to a batch when retries * (
> > > > > request.timeout.ms
> > > > > > +
> > > > > > > > > > > > > > retry.backoff.ms) < delivery.timeout.ms  and all
> > > > retries
> > > > > > are
> > > > > > > > > > > > > exhausted?  I
> > > > > > > > > > > > > > remember an internal discussion where we
> concluded
> > > that
> > > > > > > retries
> > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > > no
> > > > > > > > > > > > > > longer relevant (i.e., ignored, which is same as
> > > > > > > > > retries=MAX_LONG)
> > > > > > > > > > > when
> > > > > > > > > > > > > > there's an end-to-end delivery.timeout.ms. Do
> you
> > > > agree?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Sumant
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 27 August 2017 at 12:08, Jun Rao <
> > > j...@confluent.io>
> > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi, Jiangjie,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If we want to enforce delivery.timeout.ms, we
> > need
> > > > to
> > > > > > take
> > > > > > > > the
> > > > > > > > > > min
> > > > > > > > > > > > > > right?
> > > > > > > > > > > > > > > Also, if a user sets a large
> delivery.timeout.ms
> > ,
> > > we
> > > > > > > > probably
> > > > > > > > > > > don't
> > > > > > > > > > > > > want
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > wait for an inflight request longer than
> > > > > > > request.timeout.ms.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Aug 25, 2017 at 5:19 PM, Becket Qin <
> > > > > > > > > > becket....@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Jason,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I see what you mean. That makes sense. So in
> > the
> > > > > above
> > > > > > > case
> > > > > > > > > > after
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > producer resets PID, when it retry
> batch_0_tp1,
> > > the
> > > > > > batch
> > > > > > > > > will
> > > > > > > > > > > > still
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > the old PID even if the producer has already
> > got
> > > a
> > > > > new
> > > > > > > PID.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @Jun, do you mean max(remaining
> > > > delivery.timeout.ms,
> > > > > > > > > > > > > > request.timeout.ms)
> > > > > > > > > > > > > > > > instead of min(remaining delivery.timeout.ms
> ,
> > > > > > > > > > request.timeout.ms
> > > > > > > > > > > )?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Aug 25, 2017 at 9:34 AM, Jun Rao <
> > > > > > > j...@confluent.io
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi, Becket,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Good point on expiring inflight requests.
> > > Perhaps
> > > > > we
> > > > > > > can
> > > > > > > > > > expire
> > > > > > > > > > > > an
> > > > > > > > > > > > > > > > inflight
> > > > > > > > > > > > > > > > > request after min(remaining
> > > delivery.timeout.ms,
> > > > > > > > > > > > > request.timeout.ms
> > > > > > > > > > > > > > ).
> > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > way, if a user sets a high
> > delivery.timeout.ms
> > > ,
> > > > we
> > > > > > can
> > > > > > > > > still
> > > > > > > > > > > > > recover
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > broker power outage sooner.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Aug 24, 2017 at 12:52 PM, Becket
> Qin
> > <
> > > > > > > > > > > > becket....@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Jason,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > delivery.timeout.ms sounds good to me.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I was referring to the case that we are
> > > > resetting
> > > > > > the
> > > > > > > > > > > > > PID/sequence
> > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > expire a batch. This is more about the
> > > sending
> > > > > the
> > > > > > > > > batches
> > > > > > > > > > > > after
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > expired batch.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The scenario being discussed is expiring
> > one
> > > of
> > > > > the
> > > > > > > > > batches
> > > > > > > > > > > in
> > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > in-flight
> > > > > > > > > > > > > > > > > > request and retry the other batches in
> the
> > > that
> > > > > > > > in-flight
> > > > > > > > > > > > > request.
> > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > > > consider the following case:
> > > > > > > > > > > > > > > > > > 1. Producer sends request_0 with two
> > batches
> > > > > > > > (batch_0_tp0
> > > > > > > > > > and
> > > > > > > > > > > > > > > > > batch_0_tp1).
> > > > > > > > > > > > > > > > > > 2. Broker receives the request enqueued
> the
> > > > > request
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > log.
> > > > > > > > > > > > > > > > > > 3. Before the producer receives the
> > response
> > > > from
> > > > > > the
> > > > > > > > > > broker,
> > > > > > > > > > > > > > > > batch_0_tp0
> > > > > > > > > > > > > > > > > > expires. The producer will expire
> > batch_0_tp0
> > > > > > > > > immediately,
> > > > > > > > > > > > resets
> > > > > > > > > > > > > > > PID,
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > then resend batch_0_tp1, and maybe send
> > > > > batch_1_tp0
> > > > > > > > (i.e.
> > > > > > > > > > the
> > > > > > > > > > > > > next
> > > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > the expired batch) as well.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > For batch_0_tp1, it is OK to reuse PID
> and
> > > and
> > > > > > > sequence
> > > > > > > > > > > number.
> > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > problem
> > > > > > > > > > > > > > > > > > is for batch_1_tp0, If we reuse the same
> > PID
> > > > and
> > > > > > the
> > > > > > > > > broker
> > > > > > > > > > > has
> > > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > > appended batch_0_tp0, the broker will
> think
> > > > > > > batch_1_tp0
> > > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > > > > > > duplicate
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > the same sequence number. As a result
> > broker
> > > > will
> > > > > > > drop
> > > > > > > > > > > > > batch_0_tp1.
> > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > why we have to either bump up sequence
> > number
> > > > or
> > > > > > > reset
> > > > > > > > > PID.
> > > > > > > > > > > To
> > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > complexity, I was suggesting not expire
> the
> > > > > > in-flight
> > > > > > > > > batch
> > > > > > > > > > > > > > > > immediately,
> > > > > > > > > > > > > > > > > > but wait for the produce response. If the
> > > batch
> > > > > has
> > > > > > > > been
> > > > > > > > > > > > > > successfully
> > > > > > > > > > > > > > > > > > appended, we do not expire it. Otherwise,
> > we
> > > > > expire
> > > > > > > it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2017 at 11:26 AM, Jason
> > > > > Gustafson <
> > > > > > > > > > > > > > > ja...@confluent.io>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Good point about unnecessarily
> resetting
> > > the
> > > > > PID
> > > > > > in
> > > > > > > > > cases
> > > > > > > > > > > > where
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > know
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > request has failed. Might be worth
> > opening
> > > a
> > > > > JIRA
> > > > > > > to
> > > > > > > > > try
> > > > > > > > > > > and
> > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > So if we expire the batch prematurely
> and
> > > > > resend
> > > > > > > all
> > > > > > > > > > > > > > > > > > > > the other batches in the same
> request,
> > > > > chances
> > > > > > > are
> > > > > > > > > > there
> > > > > > > > > > > > will
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > duplicates. If we wait for the
> response
> > > > > > instead,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > less
> > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > introduce duplicates, and we may not
> > need
> > > > to
> > > > > > > reset
> > > > > > > > > the
> > > > > > > > > > > PID.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Not sure I follow this. Are you
> assuming
> > > that
> > > > > we
> > > > > > > > change
> > > > > > > > > > the
> > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > > PID/sequence of the retried batches
> after
> > > > > > resetting
> > > > > > > > the
> > > > > > > > > > > PID?
> > > > > > > > > > > > I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > probably need to ensure that when we
> > retry
> > > a
> > > > > > batch,
> > > > > > > > we
> > > > > > > > > > > always
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > PID/sequence.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > By the way, as far as naming, `
> > > > > > > > > > > max.message.delivery.wait.ms`
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > quite
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > mouthful. Could we shorten it? Perhaps
> `
> > > > > > > > > > > delivery.timeout.ms
> > > > > > > > > > > > `?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > -Jason
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 8:51 PM, Becket
> > > Qin <
> > > > > > > > > > > > > > becket....@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi Jun,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > If TCP timeout is longer than
> > > > > > request.timeout.ms
> > > > > > > ,
> > > > > > > > > the
> > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > always
> > > > > > > > > > > > > > > > > > > > hit request.timeout.ms before
> hitting
> > > TCP
> > > > > > > timeout,
> > > > > > > > > > > right?
> > > > > > > > > > > > > That
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > why
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > added request.timeout.ms in the
> first
> > > > place.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > You are right. Currently we are reset
> > the
> > > > PID
> > > > > > and
> > > > > > > > > > resend
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > batches
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > avoid OutOfOrderSequenceException
> when
> > > the
> > > > > > > expired
> > > > > > > > > > > batches
> > > > > > > > > > > > > are
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > retry.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This does not distinguish the reasons
> > > that
> > > > > > caused
> > > > > > > > the
> > > > > > > > > > > > retry.
> > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > > cases:
> > > > > > > > > > > > > > > > > > > > 1. If the batch was in retry because
> it
> > > > > > received
> > > > > > > an
> > > > > > > > > > error
> > > > > > > > > > > > > > > response
> > > > > > > > > > > > > > > > > > (e.g.
> > > > > > > > > > > > > > > > > > > > NotLeaderForPartition), we actually
> > don't
> > > > > need
> > > > > > to
> > > > > > > > > reset
> > > > > > > > > > > PID
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > > > because we know that broker did not
> > > accept
> > > > > it.
> > > > > > > > > > > > > > > > > > > > 2. If the batch was in retry because
> it
> > > > hit a
> > > > > > > > timeout
> > > > > > > > > > > > > earlier,
> > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > should reset the PID (or
> optimistically
> > > > send
> > > > > > and
> > > > > > > > only
> > > > > > > > > > > reset
> > > > > > > > > > > > > PID
> > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > receive OutOfOrderSequenceException?)
> > > > > > > > > > > > > > > > > > > > Case 1 is probably the most common
> > case,
> > > so
> > > > > it
> > > > > > > > looks
> > > > > > > > > > that
> > > > > > > > > > > > we
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > resetting
> > > > > > > > > > > > > > > > > > > > the PID more often than necessary.
> But
> > > > > because
> > > > > > in
> > > > > > > > > case
> > > > > > > > > > 1
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > not have the batch, there isn't much
> > > impact
> > > > > on
> > > > > > > > > resting
> > > > > > > > > > > PID
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > resend
> > > > > > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > than the additional round trip.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Now we are introducing another case:
> > > > > > > > > > > > > > > > > > > > 3. A batch is in retry because we
> > expired
> > > > an
> > > > > > > > > in-flight
> > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > before
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > hits request.timeout.ms.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The difference between 2 and 3 is
> that
> > in
> > > > > case
> > > > > > 3
> > > > > > > > > likely
> > > > > > > > > > > the
> > > > > > > > > > > > > > > broker
> > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > appended the messages. So if we
> expire
> > > the
> > > > > > batch
> > > > > > > > > > > > prematurely
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > resend
> > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > the other batches in the same
> request,
> > > > > chances
> > > > > > > are
> > > > > > > > > > there
> > > > > > > > > > > > will
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > duplicates. If we wait for the
> response
> > > > > > instead,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > less
> > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > introduce duplicates, and we may not
> > need
> > > > to
> > > > > > > reset
> > > > > > > > > the
> > > > > > > > > > > PID.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > That said, given that batch
> expiration
> > is
> > > > > > > probably
> > > > > > > > > > > already
> > > > > > > > > > > > > rare
> > > > > > > > > > > > > > > > > enough,
> > > > > > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > > > > > it may not be necessary to optimize
> for
> > > > that.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 5:01 PM, Jun
> > Rao
> > > <
> > > > > > > > > > > j...@confluent.io
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi, Becket,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > If a message expires while it's in
> an
> > > > > > inflight
> > > > > > > > > > produce
> > > > > > > > > > > > > > request,
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > producer will get a new PID if
> > > idempotent
> > > > > is
> > > > > > > > > enabled.
> > > > > > > > > > > > This
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > prevent
> > > > > > > > > > > > > > > > > > > > > subsequent messages from hitting
> > > > > > > > > > > > > OutOfOrderSequenceException.
> > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > not expiring an inflight request is
> > > that
> > > > > if a
> > > > > > > > > broker
> > > > > > > > > > > > server
> > > > > > > > > > > > > > > goes
> > > > > > > > > > > > > > > > > down
> > > > > > > > > > > > > > > > > > > > hard
> > > > > > > > > > > > > > > > > > > > > (e.g. power outage), the time that
> it
> > > > takes
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > > client
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > detect
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > socket level error (this will be
> sth
> > > like
> > > > > 8+
> > > > > > > > > minutes
> > > > > > > > > > > with
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > default
> > > > > > > > > > > > > > > > > > > TCP
> > > > > > > > > > > > > > > > > > > > > setting) is much longer than the
> > > default
> > > > > > > > > > > > > request.timeout.ms.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi, Sumant,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We can probably just default
> > > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > 30
> > > > > > > > > > > > > > > > > > secs,
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > current default for
> > request.timeout.ms
> > > .
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 3:38 PM,
> > Sumant
> > > > > > Tambe <
> > > > > > > > > > > > > > > suta...@gmail.com
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > OK. Looks like starting the clock
> > > after
> > > > > > > closing
> > > > > > > > > the
> > > > > > > > > > > > batch
> > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > quite a
> > > > > > > > > > > > > > > > > > > > few
> > > > > > > > > > > > > > > > > > > > > > pitfalls. I can't think of a way
> of
> > > to
> > > > > work
> > > > > > > > > around
> > > > > > > > > > it
> > > > > > > > > > > > > > without
> > > > > > > > > > > > > > > > > > adding
> > > > > > > > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > > > > > > another config. So I won't
> discuss
> > > that
> > > > > > here.
> > > > > > > > > > Anyone?
> > > > > > > > > > > > As
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > said
> > > > > > > > > > > > > > > > > > > > earlier,
> > > > > > > > > > > > > > > > > > > > > > I'm not hung up on super-accurate
> > > > > > > notification
> > > > > > > > > > times.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > If we are going down the
> > > > > > > > > > > max.message.delievery.wait.ms
> > > > > > > > > > > > > > > route,
> > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > the default? There seem to be a
> few
> > > > > > options.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 1. max.message.delievery.wait.ms
> =
> > > null.
> > > > > > > Nothing
> > > > > > > > > > > changes
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > those
> > > > > > > > > > > > > > > > > > who
> > > > > > > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > > > > set it. I.e., batches expire
> after
> > > > > > > > > > > request.timeout.ms
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > accumulator.
> > > > > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > > > > they are past the accumulator
> > stage,
> > > > > > timeout
> > > > > > > > > after
> > > > > > > > > > > > > > retries*(
> > > > > > > > > > > > > > > > > > > > > > request.timeout.ms+backoff).
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 2. max.message.delivery.wait.ms
> =r
> > > > > > > > > equest.timeout.ms
> > > > > > > > > > .
> > > > > > > > > > > No
> > > > > > > > > > > > > > > > obervable
> > > > > > > > > > > > > > > > > > > > > > behavioral
> > > > > > > > > > > > > > > > > > > > > > change at the accumulator level
> as
> > > > > timeout
> > > > > > > > value
> > > > > > > > > is
> > > > > > > > > > > > same
> > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > before.
> > > > > > > > > > > > > > > > > > > > > Retries
> > > > > > > > > > > > > > > > > > > > > > will be done if as long as batch
> is
> > > > under
> > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > > However, a batch can expire just
> > > after
> > > > > one
> > > > > > > try.
> > > > > > > > > > > That's
> > > > > > > > > > > > ok
> > > > > > > > > > > > > > IMO
> > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > request.timeout.ms tend to be
> > large
> > > > > > (Default
> > > > > > > > > > 30000).
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 3. max.message.delivery.wait.ms=
> 2*
> > > > > > > > > > request.timeout.ms
> > > > > > > > > > > .
> > > > > > > > > > > > > Give
> > > > > > > > > > > > > > > > > > > opportunity
> > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > two retries but warn that retries
> > may
> > > > not
> > > > > > > > happen
> > > > > > > > > at
> > > > > > > > > > > all
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > rare
> > > > > > > > > > > > > > > > > > > > > > cases and a batch could expire
> > before
> > > > any
> > > > > > > > > attempt.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 4. max.message.delivery.wait.ms=
> > > > > something
> > > > > > > else
> > > > > > > > > (a
> > > > > > > > > > > > > > constant?)
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thoughts?
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On 23 August 2017 at 09:01,
> Ismael
> > > > Juma <
> > > > > > > > > > > > > ism...@juma.me.uk
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks Becket, that seems
> > > reasonable.
> > > > > > > Sumant,
> > > > > > > > > > would
> > > > > > > > > > > > you
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > willing
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > update the KIP based on the
> > > > discussion
> > > > > or
> > > > > > > are
> > > > > > > > > you
> > > > > > > > > > > > still
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > convinced?
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Ismael
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 23, 2017 at 6:04
> AM,
> > > > Becket
> > > > > > > Qin <
> > > > > > > > > > > > > > > > > > becket....@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > In general
> > > > > > max.message.delivery.wait.ms
> > > > > > > > is a
> > > > > > > > > > > > cleaner
> > > > > > > > > > > > > > > > > approach.
> > > > > > > > > > > > > > > > > > > > That
> > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > make the guarantee clearer.
> > That
> > > > > said,
> > > > > > > > there
> > > > > > > > > > seem
> > > > > > > > > > > > > > > > subtleties
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > > > > > > scenarios:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > 1. I agree with Sumante that
> it
> > > is
> > > > a
> > > > > > > little
> > > > > > > > > > weird
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > expired immediately if it
> > happens
> > > > to
> > > > > > > enter
> > > > > > > > a
> > > > > > > > > > > batch
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > expired. But as Jun said, as
> > long
> > > > as
> > > > > we
> > > > > > > > have
> > > > > > > > > > > > multiple
> > > > > > > > > > > > > > > > > messages
> > > > > > > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > > > > > > > batch,
> > > > > > > > > > > > > > > > > > > > > > > > there isn't a cheap way to
> > > achieve
> > > > a
> > > > > > > > precise
> > > > > > > > > > > > timeout.
> > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > question
> > > > > > > > > > > > > > > > > > > > > > > > actually becomes whether it
> is
> > > more
> > > > > > > > > > user-friendly
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > expire
> > > > > > > > > > > > > > > > > > early
> > > > > > > > > > > > > > > > > > > > > > (based
> > > > > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > > > the batch creation time) or
> > > expire
> > > > > late
> > > > > > > > > (based
> > > > > > > > > > on
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > close
> > > > > > > > > > > > > > > > > > > > > > time).
> > > > > > > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > > > > think both are acceptable.
> > > > > Personally I
> > > > > > > > think
> > > > > > > > > > > most
> > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > really
> > > > > > > > > > > > > > > > > > > > > > > care
> > > > > > > > > > > > > > > > > > > > > > > > about expire a little late as
> > > long
> > > > as
> > > > > > it
> > > > > > > > > > > eventually
> > > > > > > > > > > > > > > > expires.
> > > > > > > > > > > > > > > > > > So I
> > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > > > batch close time as long as
> > there
> > > > is
> > > > > a
> > > > > > > > bound
> > > > > > > > > on
> > > > > > > > > > > > that.
> > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > looks
> > > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > > do not really have a bound on
> > > when
> > > > we
> > > > > > > will
> > > > > > > > > > close
> > > > > > > > > > > a
> > > > > > > > > > > > > > batch.
> > > > > > > > > > > > > > > > So
> > > > > > > > > > > > > > > > > > > > > expiration
> > > > > > > > > > > > > > > > > > > > > > > > based on batch create time
> may
> > be
> > > > the
> > > > > > > only
> > > > > > > > > > option
> > > > > > > > > > > > if
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > > > want
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > introduce complexity.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > 2. If we timeout a batch in a
> > > > request
> > > > > > > when
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > > > still
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > flight,
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > end
> > > > > > > > > > > > > > > > > > > > > > > > result of that batch is
> unclear
> > > to
> > > > > the
> > > > > > > > users.
> > > > > > > > > > It
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > weird
> > > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > user
> > > > > > > > > > > > > > > > > > > > > > > > receive exception saying
> those
> > > > > messages
> > > > > > > are
> > > > > > > > > > > expired
> > > > > > > > > > > > > > while
> > > > > > > > > > > > > > > > > they
> > > > > > > > > > > > > > > > > > > > > actually
> > > > > > > > > > > > > > > > > > > > > > > > have been sent successfully.
> > Also
> > > > if
> > > > > > > > > > idempotence
> > > > > > > > > > > is
> > > > > > > > > > > > > set
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > true,
> > > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > > > the next sequence ID be after
> > the
> > > > > > expired
> > > > > > > > > > batch?
> > > > > > > > > > > > > > Reusing
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > sequence
> > > > > > > > > > > > > > > > > > > > > > > > Id may result in data loss,
> and
> > > > > > increment
> > > > > > > > the
> > > > > > > > > > > > > sequence
> > > > > > > > > > > > > > ID
> > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > cause
> > > > > > > > > > > > > > > > > > > > > > > > OutOfOrderSequenceException.
> > > > Besides,
> > > > > > > > > > extracting
> > > > > > > > > > > an
> > > > > > > > > > > > > > > expired
> > > > > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > request also introduces some
> > > > > > complexity.
> > > > > > > > > Again,
> > > > > > > > > > > > > > > personally
> > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > > fine to expire a little bit
> > late.
> > > > So
> > > > > > > maybe
> > > > > > > > we
> > > > > > > > > > > don't
> > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > expire
> > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > > > > > > > that is already in flight. In
> > the
> > > > > worst
> > > > > > > > case
> > > > > > > > > we
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > expire
> > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > delay
> > > > > > > > > > > > > > > > > > > > > > > > of request.timeout.ms.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 22, 2017 at 3:08
> > AM,
> > > > > Ismael
> > > > > > > > Juma
> > > > > > > > > <
> > > > > > > > > > > > > > > > > > ism...@juma.me.uk>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >> Hi all,
> > > > > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >> The discussion has been
> going
> > on
> > > > > for a
> > > > > > > > > while,
> > > > > > > > > > > > would
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > >> call to discuss this? I'd
> like
> > > to
> > > > > > start
> > > > > > > a
> > > > > > > > > vote
> > > > > > > > > > > > > soonish
> > > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > >> include this in 1.0.0. I
> > > > personally
> > > > > > > prefer
> > > > > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > > > >> It seems like Jun, Apurva
> and
> > > > Jason
> > > > > > also
> > > > > > > > > > prefer
> > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > Sumant,
> > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > > > > > > > > > like
> > > > > > > > > > > > > > > > > > > > > > > >> you still prefer a
> > > > batch.expiry.ms,
> > > > > > is
> > > > > > > > that
> > > > > > > > > > > > right?
> > > > > > > > > > > > > > What
> > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > > > > >> thoughts Joel and Becket?
> > > > > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >> Ismael
> > > > > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >> On Wed, Aug 16, 2017 at 6:34
> > PM,
> > > > Jun
> > > > > > > Rao <
> > > > > > > > > > > > > > > > j...@confluent.io>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> Hi, Sumant,
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>> The semantics of linger.ms
> > is
> > > a
> > > > > bit
> > > > > > > > > subtle.
> > > > > > > > > > > The
> > > > > > > > > > > > > > > > reasoning
> > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> current
> > > > > > > > > > > > > > > > > > > > > > > >>> implementation is the
> > > following.
> > > > > > Let's
> > > > > > > > say
> > > > > > > > > > one
> > > > > > > > > > > > sets
> > > > > > > > > > > > > > > > > > linger.ms
> > > > > > > > > > > > > > > > > > > > to 0
> > > > > > > > > > > > > > > > > > > > > > > (our
> > > > > > > > > > > > > > > > > > > > > > > >>> current default value).
> > > Creating
> > > > a
> > > > > > > batch
> > > > > > > > > for
> > > > > > > > > > > > every
> > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > bad
> > > > > > > > > > > > > > > > > > > > > > > >>> for
> > > > > > > > > > > > > > > > > > > > > > > >>> throughput. Instead, the
> > > current
> > > > > > > > > > implementation
> > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > forms
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > >>> the
> > > > > > > > > > > > > > > > > > > > > > > >>> batch is sendable (i.e.,
> > broker
> > > > is
> > > > > > > > > available,
> > > > > > > > > > > > > > inflight
> > > > > > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > > > > > > > limit
> > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > >>> not
> > > > > > > > > > > > > > > > > > > > > > > >>> exceeded, etc). That way,
> the
> > > > > > producer
> > > > > > > > has
> > > > > > > > > > more
> > > > > > > > > > > > > > chance
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > batching.
> > > > > > > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > > > >>> implication is that a batch
> > > could
> > > > > be
> > > > > > > > closed
> > > > > > > > > > > > longer
> > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > linger.ms.
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>> Now, on your concern about
> > not
> > > > > > having a
> > > > > > > > > > precise
> > > > > > > > > > > > way
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > control
> > > > > > > > > > > > > > > > > > > > > delay
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > >>> the
> > > > > > > > > > > > > > > > > > > > > > > >>> accumulator. It seems the
> > > > > > > > batch.expiry.ms
> > > > > > > > > > > > approach
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > > > > > >>> issue. If you start the
> clock
> > > > when
> > > > > a
> > > > > > > > batch
> > > > > > > > > is
> > > > > > > > > > > > > > > > initialized,
> > > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > > expire
> > > > > > > > > > > > > > > > > > > > > > > >>> some messages in the same
> > batch
> > > > > early
> > > > > > > > than
> > > > > > > > > > > > > > > > batch.expiry.ms
> > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > If
> > > > > > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > > > > start
> > > > > > > > > > > > > > > > > > > > > > > >>> the clock when the batch is
> > > > closed,
> > > > > > the
> > > > > > > > > > > > expiration
> > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > >>> unbounded
> > > > > > > > > > > > > > > > > > > > > > > >>> because of the linger.ms
> > > > > > > implementation
> > > > > > > > > > > > described
> > > > > > > > > > > > > > > above.
> > > > > > > > > > > > > > > > > > > > Starting
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> expiration clock on batch
> > > > > > > initialization
> > > > > > > > > will
> > > > > > > > > > > at
> > > > > > > > > > > > > > least
> > > > > > > > > > > > > > > > > > > guarantee
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > > > > >>> to expire the first message
> > is
> > > > > > precise,
> > > > > > > > > which
> > > > > > > > > > > is
> > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > > enough.
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>> Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>> Jun
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>> On Tue, Aug 15, 2017 at
> 3:46
> > > PM,
> > > > > > Sumant
> > > > > > > > > > Tambe <
> > > > > > > > > > > > > > > > > > > suta...@gmail.com
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>> > Question about "the
> closing
> > > of
> > > > a
> > > > > > > batch
> > > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > > > delayed
> > > > > > > > > > > > > > > > > > longer
> > > > > > > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > > > > > > > > >>> > linger.ms":
> > > > > > > > > > > > > > > > > > > > > > > >>> > Is it possible to cause
> an
> > > > > > indefinite
> > > > > > > > > > delay?
> > > > > > > > > > > At
> > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > point
> > > > > > > > > > > > > > > > > > > > bytes
> > > > > > > > > > > > > > > > > > > > > > > limit
> > > > > > > > > > > > > > > > > > > > > > > >>> > might kick in. Also, why
> is
> > > > > closing
> > > > > > > of
> > > > > > > > a
> > > > > > > > > > > batch
> > > > > > > > > > > > > > > coupled
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > >>> availability of
> > > > > > > > > > > > > > > > > > > > > > > >>> > its destination? In this
> > > > > approach a
> > > > > > > > batch
> > > > > > > > > > > > chosen
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > eviction
> > > > > > > > > > > > > > > > > > > > due
> > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > >>> delay
> > > > > > > > > > > > > > > > > > > > > > > >>> > needs to "close" anyway,
> > > right
> > > > > > > (without
> > > > > > > > > > > regards
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > destination
> > > > > > > > > > > > > > > > > > > > > > > >>> > availability)?
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > I'm not too worried about
> > > > > notifying
> > > > > > > at
> > > > > > > > > > > > > super-exact
> > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > specified
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > >>> the
> > > > > > > > > > > > > > > > > > > > > > > >>> > configs. But expiring
> > before
> > > > the
> > > > > > full
> > > > > > > > > > > wait-span
> > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > elapsed
> > > > > > > > > > > > > > > > > > > > > sounds
> > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > >>> little
> > > > > > > > > > > > > > > > > > > > > > > >>> > weird. So expiration time
> > > has a
> > > > > +/-
> > > > > > > > > spread.
> > > > > > > > > > > It
> > > > > > > > > > > > > > works
> > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > like
> > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > hint
> > > > > > > > > > > > > > > > > > > > > > > >>> than
> > > > > > > > > > > > > > > > > > > > > > > >>> > max. So why not
> > > > > > > > > > > message.delivery.wait.hint.ms?
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > Yeah, cancellable future
> > will
> > > > be
> > > > > > > > similar
> > > > > > > > > in
> > > > > > > > > > > > > > > complexity.
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > I'm unsure if
> > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > will
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > final
> > > > > > > > > > > > > > > > > > > nail
> > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > >>> > producer
> > > > > > > > > > > > > > > > > > > > > > > >>> > timeouts. We still won't
> > > have a
> > > > > > > precise
> > > > > > > > > way
> > > > > > > > > > > to
> > > > > > > > > > > > > > > control
> > > > > > > > > > > > > > > > > > delay
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > > > > > > > >>> the
> > > > > > > > > > > > > > > > > > > > > > > >>> > accumulator segment.
> > > > > > batch.expiry.ms
> > > > > > > > > does
> > > > > > > > > > > not
> > > > > > > > > > > > > try
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > abstract.
> > > > > > > > > > > > > > > > > > > > > > It's
> > > > > > > > > > > > > > > > > > > > > > > >>> very
> > > > > > > > > > > > > > > > > > > > > > > >>> > specific.
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > My biggest concern at the
> > > > moment
> > > > > is
> > > > > > > > > > > > > implementation
> > > > > > > > > > > > > > > > > > > complexity.
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > At this state, I would
> like
> > > to
> > > > > > > > encourage
> > > > > > > > > > > other
> > > > > > > > > > > > > > > > > independent
> > > > > > > > > > > > > > > > > > > > > > opinions.
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > Regards,
> > > > > > > > > > > > > > > > > > > > > > > >>> > Sumant
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > On 11 August 2017 at
> 17:35,
> > > Jun
> > > > > > Rao <
> > > > > > > > > > > > > > > j...@confluent.io>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > Hi, Sumant,
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > 1. Yes, it's probably
> > > > > reasonable
> > > > > > to
> > > > > > > > > > require
> > > > > > > > > > > > > > > > > > > > > > > >>>
> max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > linger.ms. As for
> > retries,
> > > > > > perhaps
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > > set
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > default
> > > > > > > > > > > > > > > > > > > > > > retries
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > >>> > > infinite or just ignore
> > it.
> > > > > Then
> > > > > > > the
> > > > > > > > > > > latency
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > bounded
> > > > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > max.message.delivery.wait.ms
> > > > .
> > > > > > > > > > > > > request.timeout.ms
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > max
> > > > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > request will be
> spending
> > on
> > > > the
> > > > > > > > server.
> > > > > > > > > > The
> > > > > > > > > > > > > > client
> > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > expire
> > > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > >>> inflight
> > > > > > > > > > > > > > > > > > > > > > > >>> > > request early if
> needed.
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > 2. Well, since
> > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > specifies
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > max,
> > > > > > > > > > > > > > > > > > > > > > > >>> calling
> > > > > > > > > > > > > > > > > > > > > > > >>> > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > callback a bit early
> may
> > be
> > > > ok?
> > > > > > > Note
> > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > > >>>
> max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > > > > > > > > >>> > > only
> > > > > > > > > > > > > > > > > > > > > > > >>> > > comes into play in the
> > rare
> > > > > error
> > > > > > > > case.
> > > > > > > > > > > So, I
> > > > > > > > > > > > > am
> > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > > > >>> to
> > > > > > > > > > > > > > > > > > > > > > > >>> > be
> > > > > > > > > > > > > > > > > > > > > > > >>> > > very precise. The issue
> > > with
> > > > > > > starting
> > > > > > > > > the
> > > > > > > > > > > > clock
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > closing
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > >>> > that
> > > > > > > > > > > > > > > > > > > > > > > >>> > > currently if the leader
> > is
> > > > not
> > > > > > > > > available,
> > > > > > > > > > > the
> > > > > > > > > > > > > > > closing
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > batch
> > > > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > >>> be
> > > > > > > > > > > > > > > > > > > > > > > >>> > > delayed longer than
> > > > linger.ms.
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > 4. As you said,
> > > > > > future.get(timeout)
> > > > > > > > > > itself
> > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > > > solve
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > problem
> > > > > > > > > > > > > > > > > > > > > > > >>> > since
> > > > > > > > > > > > > > > > > > > > > > > >>> > > you still need a way to
> > > > expire
> > > > > > the
> > > > > > > > > record
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > sender.
> > > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > > amount
> > > > > > > > > > > > > > > > > > > > > > > >>> of
> > > > > > > > > > > > > > > > > > > > > > > >>> > work
> > > > > > > > > > > > > > > > > > > > > > > >>> > > to implement a
> > cancellable
> > > > > future
> > > > > > > is
> > > > > > > > > > > probably
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > same?
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > Overall, my concern
> with
> > > > patch
> > > > > > work
> > > > > > > > is
> > > > > > > > > > that
> > > > > > > > > > > > we
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > > iterated
> > > > > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > produce
> > > > > > > > > > > > > > > > > > > > > > > >>> > > request timeout
> multiple
> > > > times
> > > > > > and
> > > > > > > > new
> > > > > > > > > > > issues
> > > > > > > > > > > > > > keep
> > > > > > > > > > > > > > > > > coming
> > > > > > > > > > > > > > > > > > > > back.
> > > > > > > > > > > > > > > > > > > > > > > >>> Ideally,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > this time, we want to
> > have
> > > a
> > > > > > > solution
> > > > > > > > > > that
> > > > > > > > > > > > > covers
> > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > cases,
> > > > > > > > > > > > > > > > > > > > > even
> > > > > > > > > > > > > > > > > > > > > > > >>> though
> > > > > > > > > > > > > > > > > > > > > > > >>> > > that requires a bit
> more
> > > > work.
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > Jun
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > On Fri, Aug 11, 2017 at
> > > 12:30
> > > > > PM,
> > > > > > > > > Sumant
> > > > > > > > > > > > Tambe
> > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > suta...@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > >>> > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Hi Jun,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Thanks for looking
> into
> > > it.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Yes, we did consider
> > this
> > > > > > > > > message-level
> > > > > > > > > > > > > timeout
> > > > > > > > > > > > > > > > > > approach
> > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > >>> expiring
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > batches selectively
> in
> > a
> > > > > > request
> > > > > > > > but
> > > > > > > > > > > > rejected
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > due
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> reasons of
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > added complexity
> > without
> > > a
> > > > > > strong
> > > > > > > > > > benefit
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > counter-weigh
> > > > > > > > > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > > >>> Your
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > proposal is a slight
> > > > > variation
> > > > > > so
> > > > > > > > > I'll
> > > > > > > > > > > > > mention
> > > > > > > > > > > > > > > some
> > > > > > > > > > > > > > > > > > > issues
> > > > > > > > > > > > > > > > > > > > > > here.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > 1. It sounds like
> > > > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > overlap
> > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > >>> "time
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > segments" of both
> > > > linger.ms
> > > > > > and
> > > > > > > > > > retries
> > > > > > > > > > > *
> > > > > > > > > > > > (
> > > > > > > > > > > > > > > > > > > > > request.timeout.ms
> > > > > > > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > retry.backoff.ms).
> In
> > > that
> > > > > > case,
> > > > > > > > > which
> > > > > > > > > > > > > config
> > > > > > > > > > > > > > > set
> > > > > > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > > > > > > >>> precedence? It
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > would not make sense
> to
> > > > > > configure
> > > > > > > > > > configs
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > both
> > > > > > > > > > > > > > > > > > sets.
> > > > > > > > > > > > > > > > > > > > > > > >>> Especially,
> > > > > > > > > > > > > > > > > > > > > > > >>> > we
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > discussed
> exhaustively
> > > > > > internally
> > > > > > > > > that
> > > > > > > > > > > > > retries
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > max.message.delivery.wait.ms
> > > > > > > > can't /
> > > > > > > > > > > > > shouldn't
> > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > configured
> > > > > > > > > > > > > > > > > > > > > > > >>> together.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Retires become moot
> as
> > > you
> > > > > > > already
> > > > > > > > > > > > mention. I
> > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > that's
> > > > > > > > > > > > > > > > > > > > > > going
> > > > > > > > > > > > > > > > > > > > > > > >>> to be
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > surprising to anyone
> > > > wanting
> > > > > to
> > > > > > > use
> > > > > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > > > >>> We
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > probably need
> > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > linger.ms
> > > > > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > > > > > > >>> something
> > > > > > > > > > > > > > > > > > > > > > > >>> > like
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > that.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > 2. If clock starts
> > when a
> > > > > batch
> > > > > > > is
> > > > > > > > > > > created
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > expire
> > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > max.message.delivery.wait.ms
> > > > > > is
> > > > > > > > over
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > accumulator,
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > last
> > > > > > > > > > > > > > > > > > > > > > > >>> few
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > messages in the
> > expiring
> > > > > batch
> > > > > > > may
> > > > > > > > > not
> > > > > > > > > > > have
> > > > > > > > > > > > > > lived
> > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > > > > enough.
> > > > > > > > > > > > > > > > > > > > > > As
> > > > > > > > > > > > > > > > > > > > > > > >>> the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > config seems to
> > suggests
> > > > > > > > per-message
> > > > > > > > > > > > timeout,
> > > > > > > > > > > > > > > it's
> > > > > > > > > > > > > > > > > > > > incorrect
> > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > >>> expire
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > messages prematurely.
> > On
> > > > the
> > > > > > > other
> > > > > > > > > hand
> > > > > > > > > > > if
> > > > > > > > > > > > > > clock
> > > > > > > > > > > > > > > > > starts
> > > > > > > > > > > > > > > > > > > > after
> > > > > > > > > > > > > > > > > > > > > > > >>> batch is
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > closed (which also
> > > implies
> > > > > that
> > > > > > > > > > > linger.ms
> > > > > > > > > > > > is
> > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > covered
> > > > > > > > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > max.message.delivery.wait.ms
> > > > > > > > > config),
> > > > > > > > > > no
> > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > >>> expired
> > > > > > > > > > > > > > > > > > > > > > > >>> > too
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > soon. Yeah,
> expiration
> > > may
> > > > be
> > > > > > > > little
> > > > > > > > > > bit
> > > > > > > > > > > > too
> > > > > > > > > > > > > > late
> > > > > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > > > > hey,
> > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > >>> ain't
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > real-time service.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > 3. I agree that steps
> > #3,
> > > > #4,
> > > > > > > (and
> > > > > > > > > #5)
> > > > > > > > > > > are
> > > > > > > > > > > > > > > complex
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > implement.
> > > > > > > > > > > > > > > > > > > > > > > >>> On the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > other hand,
> > > > batch.expiry.ms
> > > > > is
> > > > > > > > next
> > > > > > > > > to
> > > > > > > > > > > > > trivial
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > implement.
> > > > > > > > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > > > > > >>> just
> > > > > > > > > > > > > > > > > > > > > > > >>> > > pass
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > the config all the
> way
> > > down
> > > > > to
> > > > > > > > > > > > > > > > > > ProducerBatch.maybeExpire
> > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > >>> done
> > > > > > > > > > > > > > > > > > > > > > > >>> > with
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > it.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > 4. Do you think the
> > > effect
> > > > of
> > > > > > > > > > > > > > > > > > > max.message.delivery.wait.ms
> > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > simulated
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > with
> > future.get(timeout)
> > > > > > method?
> > > > > > > > > > Copying
> > > > > > > > > > > > > > excerpt
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > kip-91:
> > > > > > > > > > > > > > > > > > > > > > > >>> An
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > end-to-end timeout
> may
> > be
> > > > > > > partially
> > > > > > > > > > > > emulated
> > > > > > > > > > > > > > > using
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > future.get(timeout).
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > The timeout must be
> > > greater
> > > > > > than
> > > > > > > (
> > > > > > > > > > > > > > > batch.expiry.ms
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > > > > nRetries
> > > > > > > > > > > > > > > > > > > > > > * (
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > request.timeout.ms +
> > > > > > > > > retry.backoff.ms
> > > > > > > > > > )).
> > > > > > > > > > > > > Note
> > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > future
> > > > > > > > > > > > > > > > > > > > > > > >>> times
> > > > > > > > > > > > > > > > > > > > > > > >>> > > out,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Sender may continue
> to
> > > send
> > > > > the
> > > > > > > > > records
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > background.
> > > > > > > > > > > > > > > > > > > > To
> > > > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > > >>> > that,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > implementing a
> > > cancellable
> > > > > > future
> > > > > > > > is
> > > > > > > > > a
> > > > > > > > > > > > > > > possibility.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > For simplicity, we
> > could
> > > > just
> > > > > > > > > > implement a
> > > > > > > > > > > > > > trivial
> > > > > > > > > > > > > > > > > > method
> > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > >>> producer
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > ProducerConfigs.
> > > > > > > > > > > maxMessageDeliveryWaitMs()
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > return
> > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > number
> > > > > > > > > > > > > > > > > > > > > > > >>> based
> > > > > > > > > > > > > > > > > > > > > > > >>> > on
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > this formula? Users
> of
> > > > > > future.get
> > > > > > > > can
> > > > > > > > > > use
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > timeout
> > > > > > > > > > > > > > > > > > > > value.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Thoughts?
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Regards,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > Sumant
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > On 11 August 2017 at
> > > 07:50,
> > > > > > > Sumant
> > > > > > > > > > Tambe
> > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > suta...@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > >>> wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > > Thanks for the KIP.
> > > Nice
> > > > > > > > > > documentation
> > > > > > > > > > > on
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > current
> > > > > > > > > > > > > > > > > > > > > issues
> > > > > > > > > > > > > > > > > > > > > > > >>> with the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> timeout.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > > For the KIP
> writeup,
> > > all
> > > > > > credit
> > > > > > > > > goes
> > > > > > > > > > to
> > > > > > > > > > > > > Joel
> > > > > > > > > > > > > > > > Koshy.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > > I'll follow up on
> > your
> > > > > > > comments a
> > > > > > > > > > > little
> > > > > > > > > > > > > > later.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> You also brought
> up
> > a
> > > > good
> > > > > > use
> > > > > > > > > case
> > > > > > > > > > > for
> > > > > > > > > > > > > > timing
> > > > > > > > > > > > > > > > > out a
> > > > > > > > > > > > > > > > > > > > > > message.
> > > > > > > > > > > > > > > > > > > > > > > >>> For
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> applications that
> > > > collect
> > > > > > and
> > > > > > > > send
> > > > > > > > > > > > sensor
> > > > > > > > > > > > > > data
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > Kafka,
> > > > > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> data
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > can't
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> be sent to Kafka
> for
> > > > some
> > > > > > > > reason,
> > > > > > > > > > the
> > > > > > > > > > > > > > > > application
> > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > prefer
> > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > >>> > buffer
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> more recent data
> in
> > > the
> > > > > > > > > accumulator.
> > > > > > > > > > > > > > Without a
> > > > > > > > > > > > > > > > > > > timeout,
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > accumulator
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> will be filled
> with
> > > old
> > > > > > > records
> > > > > > > > > and
> > > > > > > > > > > new
> > > > > > > > > > > > > > > records
> > > > > > > > > > > > > > > > > > can't
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > added.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Your proposal
> makes
> > > > sense
> > > > > > for
> > > > > > > a
> > > > > > > > > > > > developer
> > > > > > > > > > > > > > who
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > familiar
> > > > > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > > > >>> how
> > > > > > > > > > > > > > > > > > > > > > > >>> > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> producer works. I
> am
> > > not
> > > > > > sure
> > > > > > > if
> > > > > > > > > > this
> > > > > > > > > > > is
> > > > > > > > > > > > > > very
> > > > > > > > > > > > > > > > > > > intuitive
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> users
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > since
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> it may not be very
> > > easy
> > > > > for
> > > > > > > them
> > > > > > > > > to
> > > > > > > > > > > > figure
> > > > > > > > > > > > > > out
> > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > configure
> > > > > > > > > > > > > > > > > > > > > > > >>> the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > new
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> knob to bound the
> > > amount
> > > > > of
> > > > > > > the
> > > > > > > > > time
> > > > > > > > > > > > when
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > >>> completed.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> From users'
> > > perspective,
> > > > > > > > Apurva's
> > > > > > > > > > > > > suggestion
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > max.message.delivery.wait.ms
> > > > > > > > > (which
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> bounds the time
> > when a
> > > > > > message
> > > > > > > > is
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > accumulator
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > > > > >>> > when
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> callback is
> called)
> > > > seems
> > > > > > more
> > > > > > > > > > > > intuition.
> > > > > > > > > > > > > > You
> > > > > > > > > > > > > > > > > listed
> > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > rejected
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> section since it
> > > > requires
> > > > > > > > > additional
> > > > > > > > > > > > logic
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > rebatch
> > > > > > > > > > > > > > > > > > > > > when a
> > > > > > > > > > > > > > > > > > > > > > > >>> produce
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> request expires.
> > > > However,
> > > > > > this
> > > > > > > > may
> > > > > > > > > > not
> > > > > > > > > > > > be
> > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > bad.
> > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > > > >>> following are
> > > > > > > > > > > > > > > > > > > > > > > >>> > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> things that we
> have
> > to
> > > > do.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 1. The clock
> starts
> > > > when a
> > > > > > > batch
> > > > > > > > > is
> > > > > > > > > > > > > created.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 2. If the batch
> > can't
> > > be
> > > > > > > drained
> > > > > > > > > > > within
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > max.message.delivery.wait.ms
> > > ,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > all
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> messages in the
> > batch
> > > > will
> > > > > > > fail
> > > > > > > > > and
> > > > > > > > > > > the
> > > > > > > > > > > > > > > callback
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > called.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 3. When sending a
> > > > produce
> > > > > > > > request,
> > > > > > > > > > we
> > > > > > > > > > > > > > > calculate
> > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > expireTime
> > > > > > > > > > > > > > > > > > > > > > > >>> for
> > > > > > > > > > > > > > > > > > > > > > > >>> > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> request that
> equals
> > to
> > > > the
> > > > > > > > > remaining
> > > > > > > > > > > > > > > expiration
> > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> oldest
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > batch
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> in the request.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 4. We set the
> > minimum
> > > of
> > > > > the
> > > > > > > > > > > expireTime
> > > > > > > > > > > > of
> > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > inflight
> > > > > > > > > > > > > > > > > > > > > > > >>> requests as
> > > > > > > > > > > > > > > > > > > > > > > >>> > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> timeout in the
> > > selector
> > > > > poll
> > > > > > > > call
> > > > > > > > > > (so
> > > > > > > > > > > > that
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > selector
> > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > > >>> wake up
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > before
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> the expiration
> > time).
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 5. If the produce
> > > > response
> > > > > > > can't
> > > > > > > > > be
> > > > > > > > > > > > > received
> > > > > > > > > > > > > > > > > within
> > > > > > > > > > > > > > > > > > > > > > > expireTime,
> > > > > > > > > > > > > > > > > > > > > > > >>> we
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > expire
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> all batches in the
> > > > produce
> > > > > > > > request
> > > > > > > > > > > whose
> > > > > > > > > > > > > > > > > expiration
> > > > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > > has
> > > > > > > > > > > > > > > > > > > > > > > >>> been
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > reached.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> For the rest of
> the
> > > > > batches,
> > > > > > > we
> > > > > > > > > > resend
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > > > produce
> > > > > > > > > > > > > > > > > > > > > > > >>> > request.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> 6. If the producer
> > > > > response
> > > > > > > has
> > > > > > > > a
> > > > > > > > > > > > > retriable
> > > > > > > > > > > > > > > > error,
> > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > > > > > > > > > >>> backoff a
> > > > > > > > > > > > > > > > > > > > > > > >>> > > bit
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> and then retry the
> > > > produce
> > > > > > > > request
> > > > > > > > > > as
> > > > > > > > > > > > > today.
> > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > number
> > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > >>> retries
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > doesn't
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> really matter now.
> > We
> > > > just
> > > > > > > keep
> > > > > > > > > > > retrying
> > > > > > > > > > > > > > until
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > expiration
> > > > > > > > > > > > > > > > > > > > > > > >>> time
> > > > > > > > > > > > > > > > > > > > > > > >>> > is
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> reached. It's
> > possible
> > > > > that
> > > > > > a
> > > > > > > > > > produce
> > > > > > > > > > > > > > request
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > never
> > > > > > > > > > > > > > > > > > > > > > retried
> > > > > > > > > > > > > > > > > > > > > > > >>> due
> > > > > > > > > > > > > > > > > > > > > > > >>> > to
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> expiration.
> However,
> > > > this
> > > > > > > seems
> > > > > > > > > the
> > > > > > > > > > > > right
> > > > > > > > > > > > > > > thing
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > users
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > want
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> to timeout the
> > message
> > > > at
> > > > > > this
> > > > > > > > > time.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Implementation
> wise,
> > > > there
> > > > > > > will
> > > > > > > > > be a
> > > > > > > > > > > bit
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > complexity
> > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > >>> step 3
> > > > > > > > > > > > > > > > > > > > > > > >>> > and
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > 4,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> but probably not
> too
> > > > bad.
> > > > > > The
> > > > > > > > > > benefit
> > > > > > > > > > > is
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > > > > > > >>> intuitive
> > > > > > > > > > > > > > > > > > > > > > > >>> > > to
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> end user.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Does that sound
> > > > reasonable
> > > > > > to
> > > > > > > > you?
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> Jun
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> On Wed, Aug 9,
> 2017
> > at
> > > > > 10:03
> > > > > > > PM,
> > > > > > > > > > > Sumant
> > > > > > > > > > > > > > Tambe
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > >>> suta...@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > On Wed, Aug 9,
> > 2017
> > > at
> > > > > > 1:28
> > > > > > > PM
> > > > > > > > > > > Apurva
> > > > > > > > > > > > > > Mehta
> > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > >>> apu...@confluent.io>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> wrote:
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > There
> seems
> > to
> > > > be
> > > > > no
> > > > > > > > > > > > relationship
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> availability
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > or
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > staleness.
> > > > Expiry
> > > > > is
> > > > > > > > just
> > > > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > > > > > since
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> batch
> > > > > > > > > > > > > > > > > > > > > > > >>> > > has
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> been
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > ready.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > > Please
> > correct
> > > > me
> > > > > > if I
> > > > > > > > am
> > > > > > > > > > > wrong.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > I was not
> very
> > > > > > specific
> > > > > > > > > about
> > > > > > > > > > > > where
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > do
> > > > > > > > > > > > > > > > > > > > > expiration.
> > > > > > > > > > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > > > > > > > > >>> > glossed
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > over
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > some
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > details
> > because
> > > > > > (again)
> > > > > > > > > we've
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > mechanisms
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > detect
> > > > > > > > > > > > > > > > > > > > > > > >>> non
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> progress.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > The
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > condition
> > > > > > > > > (!muted.contains(tp)
> > > > > > > > > > > &&
> > > > > > > > > > > > > > > > > > > (isMetadataStale
> > > > > > > > > > > > > > > > > > > > > ||
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > >
> > > > > > cluster.leaderFor(tp)
> > > > > > > ==
> > > > > > > > > > > null))
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > used
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > RecordAccumualtor.
> > > > > > > > > > > expiredBatches:
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > https://github.com/apache/
> > > > > > > > > > > > > > > > > > > > > > kafka/blob/trunk/clients/src/
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > main/java/org/apache/kafka/
> > > > > > > > > > > > > > > > > > > > > > clients/producer/internals/
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > RecordAccumulator.java#L443
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > Effectively,
> > we
> > > > > expire
> > > > > > > in
> > > > > > > > > all
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > following
> > > > > > > > > > > > > > > > > > > cases
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 1) producer
> is
> > > > > > > partitioned
> > > > > > > > > > from
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > brokers.
> > > > > > > > > > > > > > > > > > > When
> > > > > > > > > > > > > > > > > > > > > > > >>> metadata age
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > grows
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > beyond
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 3x it's max
> > > value.
> > > > > > It's
> > > > > > > > safe
> > > > > > > > > > to
> > > > > > > > > > > > say
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > we're
> > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > > > >>> talking to
> > > > > > > > > > > > > > > > > > > > > > > >>> > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > brokers
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > at all.
> > Report.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 2) fresh
> > > metadata
> > > > &&
> > > > > > > > leader
> > > > > > > > > > for
> > > > > > > > > > > a
> > > > > > > > > > > > > > > > partition
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > > known
> > > > > > > > > > > > > > > > > > > > > > > >>> && a
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > batch
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> is
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > sitting
> there
> > > for
> > > > > > longer
> > > > > > > > > than
> > > > > > > > > > > > > > > > > > > request.timeout.ms.
> > > > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > > > >>> is one
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > case
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> we
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > would
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > like to
> > improve
> > > > and
> > > > > > use
> > > > > > > > > > > > > > batch.expiry.ms
> > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > request.timeout.ms
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> is
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > too
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > small.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > 3) fresh
> > > metadata
> > > > &&
> > > > > > > > leader
> > > > > > > > > > for
> > > > > > > > > > > a
> > > > > > > > > > > > > > > > partition
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > known
> > > > > > > > > > > > > > > > > > > > > > &&
> > > > > > > > > > > > > > > > > > > > > > > >>> batch
> > > > > > > > > > > > > > > > > > > > > > > >>> > is
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > sitting
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > there for
> > longer
> > > > > than
> > > > > > > > > > > > > batch.expiry.ms
> > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > > > > > case
> > > > > > > > > > > > > > > > > > > > > > > >>> > that
> > > > > > > > > > > > > > > > > > > > > > > >>> > > is
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > different
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > from #2.
> This
> > is
> > > > the
> > > > > > > > > catch-up
> > > > > > > > > > > mode
> > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > > > > > Things
> > > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > >>> moving too
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> slowly.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > Pipeline
> SLAs
> > > are
> > > > > > > broken.
> > > > > > > > > > Report
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > shutdown
> > > > > > > > > > > > > > > > > > > kmm.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > The second
> and
> > > the
> > > > > > third
> > > > > > > > > cases
> > > > > > > > > > > are
> > > > > > > > > > > > > > > useful
> > > > > > > > > > > > > > > > > to a
> > > > > > > > > > > > > > > > > > > > > > real-time
> > > > > > > > > > > > > > > > > > > > > > > >>> app
> > > > > > > > > > > > > > > > > > > > > > > >>> > > for a
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > completely
> > > > different
> > > > > > > > reason.
> > > > > > > > > > > > Report,
> > > > > > > > > > > > > > > > forget
> > > > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> batch,
> > > > > > > > > > > > > > > > > > > > > > > >>> > and
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> just
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > move
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > > on (without
> > > > shutting
> > > > > > > > down).
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > If I
> understand
> > > > > > correctly,
> > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > > > > > talking
> > > > > > > > > > > > > > > > > > about a
> > > > > > > > > > > > > > > > > > > > > fork
> > > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > >>> > apache
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> kafka
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > which has
> these
> > > > > > additional
> > > > > > > > > > > > conditions?
> > > > > > > > > > > > > > > > Because
> > > > > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > > > > > > check
> > > > > > > > > > > > > > > > > > > > > > > >>> > doesn't
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> exist
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > on
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > trunk today.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > Right. It is our
> > > > > internal
> > > > > > > > > release
> > > > > > > > > > in
> > > > > > > > > > > > > > > LinkedIn.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > Or are you
> > proposing
> > > > to
> > > > > > > change
> > > > > > > > > the
> > > > > > > > > > > > > > behavior
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > expiry
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > > account for
> > stale
> > > > > > metadata
> > > > > > > > and
> > > > > > > > > > > > > > partitioned
> > > > > > > > > > > > > > > > > > > producers
> > > > > > > > > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > > > > > > > > > >>> part of
> > > > > > > > > > > > > > > > > > > > > > > >>> > > this
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> KIP?
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > No. It's our
> > > temporary
> > > > > > > > solution
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > > absence
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > kip-91.
> > > > > > > > > > > > > > > > > > > > > > > Note
> > > > > > > > > > > > > > > > > > > > > > > >>> > that
> > > > > > > > > > > > > > > > > > > > > > > >>> > > we
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> dont
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > like increasing
> > > > > > > > > > request.timeout.ms.
> > > > > > > > > > > > > > Without
> > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > > > > > > > > >>> conditions
> > > > > > > > > > > > > > > > > > > > > > > >>> > > our
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > batches expire
> too
> > > > > soon--a
> > > > > > > > > problem
> > > > > > > > > > > in
> > > > > > > > > > > > > kmm
> > > > > > > > > > > > > > > > > catchup
> > > > > > > > > > > > > > > > > > > > mode.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > If we get
> > > > > batch.expiry.ms
> > > > > > ,
> > > > > > > we
> > > > > > > > > > will
> > > > > > > > > > > > > > > configure
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > 20
> > > > > > > > > > > > > > > > > > > > > > mins.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > maybeExpire
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > will use the
> > config
> > > > > > instead
> > > > > > > of
> > > > > > > > > > > r.t.ms
> > > > > > > > > > > > .
> > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > extra
> > > > > > > > > > > > > > > > > > > > > > conditions
> > > > > > > > > > > > > > > > > > > > > > > >>> will
> > > > > > > > > > > > > > > > > > > > > > > >>> > be
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > unnecessary. All
> > > three
> > > > > > cases
> > > > > > > > > shall
> > > > > > > > > > > be
> > > > > > > > > > > > > > > covered
> > > > > > > > > > > > > > > > > via
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > >>> batch.expiry
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> timeout.
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >> >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > > >
> > > > > > > > > > > > > > > > > > > > > > > >>> > >
> > > > > > > > > > > > > > > > > > > > > > > >>> >
> > > > > > > > > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to