Re: [DISCUSS] KIP-890 Server Side Defense

Justine Olshan Wed, 18 Jan 2023 15:01:12 -0800

Yeah -- looks like we already have code to handle bumping the epoch and
when the epoch is Short.MAX_VALUE, we get a new producer ID. Since this is
already the behavior, do we want to change it further?


Justine

On Wed, Jan 18, 2023 at 1:12 PM Justine Olshan <jols...@confluent.io> wrote:

> Hey all, just wanted to quickly update and say I've modified the KIP to
> explicitly mention that AddOffsetCommitsToTxnRequest will be replaced by
> a coordinator-side (inter-broker) AddPartitionsToTxn implicit request. This
> mirrors the user partitions and will implicitly add offset partitions to
> transactions when we commit offsets on them. We will deprecate 
> AddOffsetCommitsToTxnRequest
> for new clients.
>
> Also to address Artem's comments --
> I'm a bit unsure if the changes here will change the previous behavior for
> fencing producers. In the case you mention in the first paragraph, are you
> saying we bump the epoch before we try to abort the transaction? I think I
> need to understand the scenarios you mention a bit better.
>
> As for the second part -- I think it makes sense to have some sort of
> "sentinel" epoch to signal epoch is about to overflow (I think we sort of
> have this value in place in some ways) so we can codify it in the KIP. I'll
> look into that and try to update soon.
>
> Thanks,
> Justine.
>
> On Fri, Jan 13, 2023 at 5:01 PM Artem Livshits
> <alivsh...@confluent.io.invalid> wrote:
>
>> It's good to know that KIP-588 addressed some of the issues.  Looking at
>> the code, it still looks like there are some cases that would result in
>> fatal error, e.g. PRODUCER_FENCED is issued by the transaction coordinator
>> if epoch doesn't match, and the client treats it as a fatal error (code in
>> TransactionManager request handling).  If we consider, for example,
>> committing a transaction that returns a timeout, but actually succeeds,
>> trying to abort it or re-commit may result in PRODUCER_FENCED error
>> (because of epoch bump).
>>
>> For failed commits, specifically, we need to know the actual outcome,
>> because if we return an error the application may think that the
>> transaction is aborted and redo the work, leading to duplicates.
>>
>> Re: overflowing epoch.  We could either do it on the TC and return both
>> producer id and epoch (e.g. change the protocol), or signal the client
>> that
>> it needs to get a new producer id.  Checking for max epoch could be a
>> reasonable signal, the value to check should probably be present in the
>> KIP
>> as this is effectively a part of the contract.  Also, the TC should
>> probably return an error if the client didn't change producer id after
>> hitting max epoch.
>>
>> -Artem
>>
>>
>> On Thu, Jan 12, 2023 at 10:31 AM Justine Olshan
>> <jols...@confluent.io.invalid> wrote:
>>
>> > Thanks for the discussion Artem.
>> >
>> > With respect to the handling of fenced producers, we have some behavior
>> > already in place. As of KIP-588:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
>> > ,
>> > we handle timeouts more gracefully. The producer can recover.
>> >
>> > Produce requests can also recover from epoch fencing by aborting the
>> > transaction and starting over.
>> >
>> > What other cases were you considering that would cause us to have a
>> fenced
>> > epoch but we'd want to recover?
>> >
>> > The first point about handling epoch overflows is fair. I think there is
>> > some logic we'd need to consider. (ie, if we are one away from the max
>> > epoch, we need to reset the producer ID.) I'm still wondering if there
>> is a
>> > way to direct this from the response, or if everything should be done on
>> > the client side. Let me know if you have any thoughts here.
>> >
>> > Thanks,
>> > Justine
>> >
>> > On Tue, Jan 10, 2023 at 4:06 PM Artem Livshits
>> > <alivsh...@confluent.io.invalid> wrote:
>> >
>> > > There are some workflows in the client that are implied by protocol
>> > > changes, e.g.:
>> > >
>> > > - for new clients, epoch changes with every transaction and can
>> overflow,
>> > > in old clients this condition was handled transparently, because epoch
>> > was
>> > > bumped in InitProducerId and it would return a new producer id if
>> epoch
>> > > overflows, the new clients would need to implement some workflow to
>> > refresh
>> > > producer id
>> > > - how to handle fenced producers, for new clients epoch changes with
>> > every
>> > > transaction, so in presence of failures during commits / aborts, the
>> > > producer could get easily fenced, old clients would pretty much would
>> get
>> > > fenced when a new incarnation of the producer was initialized with
>> > > InitProducerId so it's ok to treat as a fatal error, the new clients
>> > would
>> > > need to implement some workflow to handle that error, otherwise they
>> > could
>> > > get fenced by themselves
>> > > - in particular (as a subset of the previous issue), what would the
>> > client
>> > > do if it got a timeout during commit?  commit could've succeeded or
>> > failed
>> > >
>> > > Not sure if this has to be defined in the KIP as implementing those
>> > > probably wouldn't require protocol changes, but we have multiple
>> > > implementations of Kafka clients, so probably would be good to have
>> some
>> > > client implementation guidance.  Could also be done as a separate doc.
>> > >
>> > > -Artem
>> > >
>> > > On Mon, Jan 9, 2023 at 3:38 PM Justine Olshan
>> > <jols...@confluent.io.invalid
>> > > >
>> > > wrote:
>> > >
>> > > > Hey all, I've updated the KIP to incorporate Jason's suggestions.
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense
>> > > >
>> > > >
>> > > > 1. Use AddPartitionsToTxn + verify flag to check on old clients
>> > > > 2. Updated AddPartitionsToTxn API to support transaction batching
>> > > > 3. Mention IBP bump
>> > > > 4. Mention auth change on new AddPartitionsToTxn version.
>> > > >
>> > > > I'm planning on opening a vote soon.
>> > > > Thanks,
>> > > > Justine
>> > > >
>> > > > On Fri, Jan 6, 2023 at 3:32 PM Justine Olshan <jols...@confluent.io
>> >
>> > > > wrote:
>> > > >
>> > > > > Thanks Jason. Those changes make sense to me. I will update the
>> KIP.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Jan 6, 2023 at 3:31 PM Jason Gustafson
>> > > > <ja...@confluent.io.invalid>
>> > > > > wrote:
>> > > > >
>> > > > >> Hey Justine,
>> > > > >>
>> > > > >> > I was wondering about compatibility here. When we send requests
>> > > > >> between brokers, we want to ensure that the receiving broker
>> > > understands
>> > > > >> the request (specifically the new fields). Typically this is done
>> > via
>> > > > >> IBP/metadata version.
>> > > > >> I'm trying to think if there is a way around it but I'm not sure
>> > there
>> > > > is.
>> > > > >>
>> > > > >> Yes. I think we would gate usage of this behind an IBP bump. Does
>> > that
>> > > > >> seem
>> > > > >> reasonable?
>> > > > >>
>> > > > >> > As for the improvements -- can you clarify how the multiple
>> > > > >> transactional
>> > > > >> IDs would help here? Were you thinking of a case where we
>> wait/batch
>> > > > >> multiple produce requests together? My understanding for now was
>> 1
>> > > > >> transactional ID and one validation per 1 produce request.
>> > > > >>
>> > > > >> Each call to `AddPartitionsToTxn` is essentially a write to the
>> > > > >> transaction
>> > > > >> log and must block on replication. The more we can fit into a
>> single
>> > > > >> request, the more writes we can do in parallel. The alternative
>> is
>> > to
>> > > > make
>> > > > >> use of more connections, but usually we prefer batching since the
>> > > > network
>> > > > >> stack is not really optimized for high connection/request loads.
>> > > > >>
>> > > > >> > Finally with respect to the authorizations, I think it makes
>> sense
>> > > to
>> > > > >> skip
>> > > > >> topic authorizations, but I'm a bit confused by the "leader ID"
>> > field.
>> > > > >> Wouldn't we just want to flag the request as from a broker (does
>> it
>> > > > matter
>> > > > >> which one?).
>> > > > >>
>> > > > >> We could also make it version-based. For the next version, we
>> could
>> > > > >> require
>> > > > >> CLUSTER auth. So clients would not be able to use the API
>> anymore,
>> > > which
>> > > > >> is
>> > > > >> probably what we want.
>> > > > >>
>> > > > >> -Jason
>> > > > >>
>> > > > >> On Fri, Jan 6, 2023 at 10:43 AM Justine Olshan
>> > > > >> <jols...@confluent.io.invalid>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > As a follow up, I was just thinking about the batching a bit
>> more.
>> > > > >> > I suppose if we have one request in flight and we queue up the
>> > other
>> > > > >> > produce requests in some sort of purgatory, we could send
>> > > information
>> > > > >> out
>> > > > >> > for all of them rather than one by one. So that would be a
>> benefit
>> > > of
>> > > > >> > batching partitions to add per transaction.
>> > > > >> >
>> > > > >> > I'll need to think a bit more on the design of this part of the
>> > KIP,
>> > > > and
>> > > > >> > will update the KIP in the next few days.
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> > Justine
>> > > > >> >
>> > > > >> > On Fri, Jan 6, 2023 at 10:22 AM Justine Olshan <
>> > > jols...@confluent.io>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > Hey Jason -- thanks for the input -- I was just digging a bit
>> > > deeper
>> > > > >> into
>> > > > >> > > the design + implementation of the validation calls here and
>> > what
>> > > > you
>> > > > >> say
>> > > > >> > > makes sense.
>> > > > >> > >
>> > > > >> > > I was wondering about compatibility here. When we send
>> requests
>> > > > >> > > between brokers, we want to ensure that the receiving broker
>> > > > >> understands
>> > > > >> > > the request (specifically the new fields). Typically this is
>> > done
>> > > > via
>> > > > >> > > IBP/metadata version.
>> > > > >> > > I'm trying to think if there is a way around it but I'm not
>> sure
>> > > > there
>> > > > >> > is.
>> > > > >> > >
>> > > > >> > > As for the improvements -- can you clarify how the multiple
>> > > > >> transactional
>> > > > >> > > IDs would help here? Were you thinking of a case where we
>> > > wait/batch
>> > > > >> > > multiple produce requests together? My understanding for now
>> > was 1
>> > > > >> > > transactional ID and one validation per 1 produce request.
>> > > > >> > >
>> > > > >> > > Finally with respect to the authorizations, I think it makes
>> > sense
>> > > > to
>> > > > >> > skip
>> > > > >> > > topic authorizations, but I'm a bit confused by the "leader
>> ID"
>> > > > field.
>> > > > >> > > Wouldn't we just want to flag the request as from a broker
>> (does
>> > > it
>> > > > >> > matter
>> > > > >> > > which one?).
>> > > > >> > >
>> > > > >> > > I think I want to adopt these suggestions, just had a few
>> > > questions
>> > > > on
>> > > > >> > the
>> > > > >> > > details.
>> > > > >> > >
>> > > > >> > > Thanks,
>> > > > >> > > Justine
>> > > > >> > >
>> > > > >> > > On Thu, Jan 5, 2023 at 5:05 PM Jason Gustafson
>> > > > >> > <ja...@confluent.io.invalid>
>> > > > >> > > wrote:
>> > > > >> > >
>> > > > >> > >> Hi Justine,
>> > > > >> > >>
>> > > > >> > >> Thanks for the proposal.
>> > > > >> > >>
>> > > > >> > >> I was thinking about the implementation a little bit. In the
>> > > > current
>> > > > >> > >> proposal, the behavior depends on whether we have an old or
>> new
>> > > > >> client.
>> > > > >> > >> For
>> > > > >> > >> old clients, we send `DescribeTransactions` and verify the
>> > result
>> > > > and
>> > > > >> > for
>> > > > >> > >> new clients, we send `AddPartitionsToTxn`. We might be able
>> to
>> > > > >> simplify
>> > > > >> > >> the
>> > > > >> > >> implementation if we can use the same request type. For
>> > example,
>> > > > >> what if
>> > > > >> > >> we
>> > > > >> > >> bump the protocol version for `AddPartitionsToTxn` and add a
>> > > > >> > >> `validateOnly`
>> > > > >> > >> flag? For older versions, we can set `validateOnly=true` so
>> > that
>> > > > the
>> > > > >> > >> request only returns successfully if the partition had
>> already
>> > > been
>> > > > >> > added.
>> > > > >> > >> For new versions, we can set `validateOnly=false` and the
>> > > partition
>> > > > >> will
>> > > > >> > >> be
>> > > > >> > >> added to the transaction. The other slightly annoying thing
>> > that
>> > > > this
>> > > > >> > >> would
>> > > > >> > >> get around is the need to collect the transaction state for
>> all
>> > > > >> > partitions
>> > > > >> > >> even when we only care about a subset.
>> > > > >> > >>
>> > > > >> > >> Some additional improvements to consider:
>> > > > >> > >>
>> > > > >> > >> - We can give `AddPartitionsToTxn` better batch support for
>> > > > >> inter-broker
>> > > > >> > >> usage. Currently we only allow one `TransactionalId` to be
>> > > > specified,
>> > > > >> > but
>> > > > >> > >> the broker may get some benefit being able to batch across
>> > > multiple
>> > > > >> > >> transactions.
>> > > > >> > >> - Another small improvement is skipping topic authorization
>> > > checks
>> > > > >> for
>> > > > >> > >> `AddPartitionsToTxn` when the request is from a broker.
>> Perhaps
>> > > we
>> > > > >> can
>> > > > >> > add
>> > > > >> > >> a field for the `LeaderId` or something like that and
>> require
>> > > > CLUSTER
>> > > > >> > >> permission when set.
>> > > > >> > >>
>> > > > >> > >> Best,
>> > > > >> > >> Jason
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >>
>> > > > >> > >> On Mon, Dec 19, 2022 at 3:56 PM Jun Rao
>> > <j...@confluent.io.invalid
>> > > >
>> > > > >> > wrote:
>> > > > >> > >>
>> > > > >> > >> > Hi, Justine,
>> > > > >> > >> >
>> > > > >> > >> > Thanks for the explanation. It makes sense to me now.
>> > > > >> > >> >
>> > > > >> > >> > Jun
>> > > > >> > >> >
>> > > > >> > >> > On Mon, Dec 19, 2022 at 1:42 PM Justine Olshan
>> > > > >> > >> > <jols...@confluent.io.invalid>
>> > > > >> > >> > wrote:
>> > > > >> > >> >
>> > > > >> > >> > > Hi Jun,
>> > > > >> > >> > >
>> > > > >> > >> > > My understanding of the mechanism is that when we get to
>> > the
>> > > > last
>> > > > >> > >> epoch,
>> > > > >> > >> > we
>> > > > >> > >> > > increment to the fencing/last epoch and if any further
>> > > requests
>> > > > >> come
>> > > > >> > >> in
>> > > > >> > >> > for
>> > > > >> > >> > > this producer ID they are fenced. Then the producer
>> gets a
>> > > new
>> > > > ID
>> > > > >> > and
>> > > > >> > >> > > restarts with epoch/sequence 0. The fenced epoch sticks
>> > > around
>> > > > >> for
>> > > > >> > the
>> > > > >> > >> > > duration of producer.id.expiration.ms and blocks any
>> late
>> > > > >> messages
>> > > > >> > >> > there.
>> > > > >> > >> > > The new ID will get to take advantage of the improved
>> > > semantics
>> > > > >> > around
>> > > > >> > >> > > non-zero start sequences. So I think we are covered.
>> > > > >> > >> > >
>> > > > >> > >> > > The only potential issue is overloading the cache, but
>> > > > hopefully
>> > > > >> the
>> > > > >> > >> > > improvements (lowered producer.id.expiration.ms) will
>> help
>> > > > with
>> > > > >> > that.
>> > > > >> > >> > Let
>> > > > >> > >> > > me know if you still have concerns.
>> > > > >> > >> > >
>> > > > >> > >> > > Thanks,
>> > > > >> > >> > > Justine
>> > > > >> > >> > >
>> > > > >> > >> > > On Mon, Dec 19, 2022 at 10:24 AM Jun Rao
>> > > > >> <j...@confluent.io.invalid>
>> > > > >> > >> > wrote:
>> > > > >> > >> > >
>> > > > >> > >> > > > Hi, Justine,
>> > > > >> > >> > > >
>> > > > >> > >> > > > Thanks for the explanation.
>> > > > >> > >> > > >
>> > > > >> > >> > > > 70. The proposed fencing logic doesn't apply when pid
>> > > > changes,
>> > > > >> is
>> > > > >> > >> that
>> > > > >> > >> > > > right? If so, I am not sure how complete we are
>> > addressing
>> > > > this
>> > > > >> > >> issue
>> > > > >> > >> > if
>> > > > >> > >> > > > the pid changes more frequently.
>> > > > >> > >> > > >
>> > > > >> > >> > > > Thanks,
>> > > > >> > >> > > >
>> > > > >> > >> > > > Jun
>> > > > >> > >> > > >
>> > > > >> > >> > > >
>> > > > >> > >> > > >
>> > > > >> > >> > > > On Fri, Dec 16, 2022 at 9:16 AM Justine Olshan
>> > > > >> > >> > > > <jols...@confluent.io.invalid>
>> > > > >> > >> > > > wrote:
>> > > > >> > >> > > >
>> > > > >> > >> > > > > Hi Jun,
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > Thanks for replying!
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > 70.We already do the overflow mechanism, so my
>> change
>> > > would
>> > > > >> just
>> > > > >> > >> make
>> > > > >> > >> > > it
>> > > > >> > >> > > > > happen more often.
>> > > > >> > >> > > > > I was also not suggesting a new field in the log,
>> but
>> > in
>> > > > the
>> > > > >> > >> > response,
>> > > > >> > >> > > > > which would be gated by the client version. Sorry if
>> > > > >> something
>> > > > >> > >> there
>> > > > >> > >> > is
>> > > > >> > >> > > > > unclear. I think we are starting to diverge.
>> > > > >> > >> > > > > The goal of this KIP is to not change to the marker
>> > > format
>> > > > at
>> > > > >> > all.
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > 71. Yes, I guess I was going under the assumption
>> that
>> > > the
>> > > > >> log
>> > > > >> > >> would
>> > > > >> > >> > > just
>> > > > >> > >> > > > > look at its last epoch and treat it as the current
>> > > epoch. I
>> > > > >> > >> suppose
>> > > > >> > >> > we
>> > > > >> > >> > > > can
>> > > > >> > >> > > > > have some special logic that if the last epoch was
>> on a
>> > > > >> marker
>> > > > >> > we
>> > > > >> > >> > > > actually
>> > > > >> > >> > > > > expect the next epoch or something like that. We
>> just
>> > > need
>> > > > to
>> > > > >> > >> > > distinguish
>> > > > >> > >> > > > > based on whether we had a commit/abort marker.
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > 72.
>> > > > >> > >> > > > > > if the producer epoch hasn't been bumped on the
>> > > > >> > >> > > > > broker, it seems that the stucked message will fail
>> the
>> > > > >> sequence
>> > > > >> > >> > > > validation
>> > > > >> > >> > > > > and will be ignored. If the producer epoch has been
>> > > bumped,
>> > > > >> we
>> > > > >> > >> ignore
>> > > > >> > >> > > the
>> > > > >> > >> > > > > sequence check and the stuck message could be
>> appended
>> > to
>> > > > the
>> > > > >> > log.
>> > > > >> > >> > So,
>> > > > >> > >> > > is
>> > > > >> > >> > > > > the latter case that we want to guard?
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > I'm not sure I follow that "the message will fail
>> the
>> > > > >> sequence
>> > > > >> > >> > > > validation".
>> > > > >> > >> > > > > In some of these cases, we had an abort marker (due
>> to
>> > an
>> > > > >> error)
>> > > > >> > >> and
>> > > > >> > >> > > then
>> > > > >> > >> > > > > the late message comes in with the correct sequence
>> > > number.
>> > > > >> This
>> > > > >> > >> is a
>> > > > >> > >> > > > case
>> > > > >> > >> > > > > covered by the KIP.
>> > > > >> > >> > > > > The latter case is actually not something we've
>> > > considered
>> > > > >> > here. I
>> > > > >> > >> > > think
>> > > > >> > >> > > > > generally when we bump the epoch, we are accepting
>> that
>> > > the
>> > > > >> > >> sequence
>> > > > >> > >> > > does
>> > > > >> > >> > > > > not need to be checked anymore. My understanding is
>> > also
>> > > > >> that we
>> > > > >> > >> > don't
>> > > > >> > >> > > > > typically bump epoch mid transaction (based on a
>> quick
>> > > look
>> > > > >> at
>> > > > >> > the
>> > > > >> > >> > > code)
>> > > > >> > >> > > > > but let me know if that is the case.
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > Thanks,
>> > > > >> > >> > > > > Justine
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > On Thu, Dec 15, 2022 at 12:23 PM Jun Rao
>> > > > >> > <j...@confluent.io.invalid
>> > > > >> > >> >
>> > > > >> > >> > > > wrote:
>> > > > >> > >> > > > >
>> > > > >> > >> > > > > > Hi, Justine,
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > Thanks for the reply.
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > 70. Assigning a new pid on int overflow seems a
>> bit
>> > > > hacky.
>> > > > >> If
>> > > > >> > we
>> > > > >> > >> > > need a
>> > > > >> > >> > > > > txn
>> > > > >> > >> > > > > > level id, it will be better to model this
>> explicitly.
>> > > > >> Adding a
>> > > > >> > >> new
>> > > > >> > >> > > > field
>> > > > >> > >> > > > > > would require a bit more work since it requires a
>> new
>> > > txn
>> > > > >> > marker
>> > > > >> > >> > > format
>> > > > >> > >> > > > > in
>> > > > >> > >> > > > > > the log. So, we probably need to guard it with an
>> IBP
>> > > or
>> > > > >> > >> metadata
>> > > > >> > >> > > > version
>> > > > >> > >> > > > > > and document the impact on downgrade once the new
>> > > format
>> > > > is
>> > > > >> > >> written
>> > > > >> > >> > > to
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > log.
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > 71. Hmm, once the marker is written, the partition
>> > will
>> > > > >> expect
>> > > > >> > >> the
>> > > > >> > >> > > next
>> > > > >> > >> > > > > > append to be on the next epoch. Does that cover
>> the
>> > > case
>> > > > >> you
>> > > > >> > >> > > mentioned?
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > 72. Also, just to be clear on the stucked message
>> > issue
>> > > > >> > >> described
>> > > > >> > >> > in
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > motivation. With EoS, we also validate the
>> sequence
>> > id
>> > > > for
>> > > > >> > >> > > idempotency.
>> > > > >> > >> > > > > So,
>> > > > >> > >> > > > > > with the current logic, if the producer epoch
>> hasn't
>> > > been
>> > > > >> > >> bumped on
>> > > > >> > >> > > the
>> > > > >> > >> > > > > > broker, it seems that the stucked message will
>> fail
>> > the
>> > > > >> > sequence
>> > > > >> > >> > > > > validation
>> > > > >> > >> > > > > > and will be ignored. If the producer epoch has
>> been
>> > > > >> bumped, we
>> > > > >> > >> > ignore
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > sequence check and the stuck message could be
>> > appended
>> > > to
>> > > > >> the
>> > > > >> > >> log.
>> > > > >> > >> > > So,
>> > > > >> > >> > > > is
>> > > > >> > >> > > > > > the latter case that we want to guard?
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > Thanks,
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > Jun
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > On Wed, Dec 14, 2022 at 10:44 AM Justine Olshan
>> > > > >> > >> > > > > > <jols...@confluent.io.invalid> wrote:
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > > > > Matthias — thanks again for taking time to look
>> a
>> > > this.
>> > > > >> You
>> > > > >> > >> said:
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > > My proposal was only focusing to avoid
>> dangling
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > transactions if records are added without
>> > registered
>> > > > >> > >> partition.
>> > > > >> > >> > --
>> > > > >> > >> > > > > Maybe
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > you can add a few more details to the KIP about
>> > this
>> > > > >> > scenario
>> > > > >> > >> for
>> > > > >> > >> > > > > better
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > documentation purpose?
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > I'm not sure I understand what you mean here.
>> The
>> > > > >> motivation
>> > > > >> > >> > > section
>> > > > >> > >> > > > > > > describes two scenarios about how the record
>> can be
>> > > > added
>> > > > >> > >> > without a
>> > > > >> > >> > > > > > > registered partition:
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > > This can happen when a message gets stuck or
>> > > delayed
>> > > > >> due
>> > > > >> > to
>> > > > >> > >> > > > > networking
>> > > > >> > >> > > > > > > issues or a network partition, the transaction
>> > > aborts,
>> > > > >> and
>> > > > >> > >> then
>> > > > >> > >> > the
>> > > > >> > >> > > > > > delayed
>> > > > >> > >> > > > > > > message finally comes in.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > > Another way hanging transactions can occur is
>> > that
>> > > a
>> > > > >> > client
>> > > > >> > >> is
>> > > > >> > >> > > > buggy
>> > > > >> > >> > > > > > and
>> > > > >> > >> > > > > > > may somehow try to write to a partition before
>> it
>> > > adds
>> > > > >> the
>> > > > >> > >> > > partition
>> > > > >> > >> > > > to
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > transaction.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > For the first example of this would it be
>> helpful
>> > to
>> > > > say
>> > > > >> > that
>> > > > >> > >> > this
>> > > > >> > >> > > > > > message
>> > > > >> > >> > > > > > > comes in after the abort, but before the
>> partition
>> > is
>> > > > >> added
>> > > > >> > to
>> > > > >> > >> > the
>> > > > >> > >> > > > next
>> > > > >> > >> > > > > > > transaction so it becomes "hanging." Perhaps the
>> > next
>> > > > >> > sentence
>> > > > >> > >> > > > > describing
>> > > > >> > >> > > > > > > the message becoming part of the next
>> transaction
>> > (a
>> > > > >> > different
>> > > > >> > >> > > case)
>> > > > >> > >> > > > > was
>> > > > >> > >> > > > > > > not properly differentiated.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > Jun — thanks for reading the KIP.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > 70. The int typing was a concern. Currently we
>> > have a
>> > > > >> > >> mechanism
>> > > > >> > >> > in
>> > > > >> > >> > > > > place
>> > > > >> > >> > > > > > to
>> > > > >> > >> > > > > > > fence the final epoch when the epoch is about to
>> > > > overflow
>> > > > >> > and
>> > > > >> > >> > > assign
>> > > > >> > >> > > > a
>> > > > >> > >> > > > > > new
>> > > > >> > >> > > > > > > producer ID with epoch 0. Of course, this is a
>> bit
>> > > > tricky
>> > > > >> > >> when it
>> > > > >> > >> > > > comes
>> > > > >> > >> > > > > > to
>> > > > >> > >> > > > > > > the response back to the client.
>> > > > >> > >> > > > > > > Making this a long could be another option, but
>> I
>> > > > wonder
>> > > > >> are
>> > > > >> > >> > there
>> > > > >> > >> > > > any
>> > > > >> > >> > > > > > > implications on changing this field if the
>> epoch is
>> > > > >> > persisted
>> > > > >> > >> to
>> > > > >> > >> > > > disk?
>> > > > >> > >> > > > > > I'd
>> > > > >> > >> > > > > > > need to check the usages.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > 71.This was something Matthias asked about as
>> > well. I
>> > > > was
>> > > > >> > >> > > > considering a
>> > > > >> > >> > > > > > > possible edge case where a produce request from
>> a
>> > new
>> > > > >> > >> transaction
>> > > > >> > >> > > > > somehow
>> > > > >> > >> > > > > > > gets sent right after the marker is written, but
>> > > before
>> > > > >> the
>> > > > >> > >> > > producer
>> > > > >> > >> > > > is
>> > > > >> > >> > > > > > > alerted of the newly bumped epoch. In this
>> case, we
>> > > may
>> > > > >> > >> include
>> > > > >> > >> > > this
>> > > > >> > >> > > > > > record
>> > > > >> > >> > > > > > > when we don't want to. I suppose we could try
>> to do
>> > > > >> > something
>> > > > >> > >> > > client
>> > > > >> > >> > > > > side
>> > > > >> > >> > > > > > > to bump the epoch after sending an endTxn as
>> well
>> > in
>> > > > this
>> > > > >> > >> > scenario
>> > > > >> > >> > > —
>> > > > >> > >> > > > > but
>> > > > >> > >> > > > > > I
>> > > > >> > >> > > > > > > wonder how it would work when the server is
>> > aborting
>> > > > >> based
>> > > > >> > on
>> > > > >> > >> a
>> > > > >> > >> > > > > > server-side
>> > > > >> > >> > > > > > > error. I could also be missing something and
>> this
>> > > > >> scenario
>> > > > >> > is
>> > > > >> > >> > > > actually
>> > > > >> > >> > > > > > not
>> > > > >> > >> > > > > > > possible.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > Thanks again to everyone reading and commenting.
>> > Let
>> > > me
>> > > > >> know
>> > > > >> > >> > about
>> > > > >> > >> > > > any
>> > > > >> > >> > > > > > > further questions or comments.
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > Justine
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > On Wed, Dec 14, 2022 at 9:41 AM Jun Rao
>> > > > >> > >> <j...@confluent.io.invalid
>> > > > >> > >> > >
>> > > > >> > >> > > > > > wrote:
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > > Hi, Justine,
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > Thanks for the KIP. A couple of comments.
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > 70. Currently, the producer epoch is an int.
>> I am
>> > > not
>> > > > >> sure
>> > > > >> > >> if
>> > > > >> > >> > > it's
>> > > > >> > >> > > > > > enough
>> > > > >> > >> > > > > > > > to accommodate all transactions in the
>> lifetime
>> > of
>> > > a
>> > > > >> > >> producer.
>> > > > >> > >> > > > Should
>> > > > >> > >> > > > > > we
>> > > > >> > >> > > > > > > > change that to a long or add a new long field
>> > like
>> > > > >> txnId?
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > 71. "it will write the prepare commit message
>> > with
>> > > a
>> > > > >> > bumped
>> > > > >> > >> > epoch
>> > > > >> > >> > > > and
>> > > > >> > >> > > > > > > send
>> > > > >> > >> > > > > > > > WriteTxnMarkerRequests with the bumped epoch."
>> > Hmm,
>> > > > the
>> > > > >> > >> epoch
>> > > > >> > >> > is
>> > > > >> > >> > > > > > > associated
>> > > > >> > >> > > > > > > > with the current txn right? So, it seems
>> weird to
>> > > > >> write a
>> > > > >> > >> > commit
>> > > > >> > >> > > > > > message
>> > > > >> > >> > > > > > > > with a bumped epoch. Should we only bump up
>> the
>> > > epoch
>> > > > >> in
>> > > > >> > >> > > > > EndTxnResponse
>> > > > >> > >> > > > > > > and
>> > > > >> > >> > > > > > > > rename the field to sth like
>> nextProducerEpoch?
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > Thanks,
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > Jun
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > On Mon, Dec 12, 2022 at 8:54 PM Matthias J.
>> Sax <
>> > > > >> > >> > > mj...@apache.org>
>> > > > >> > >> > > > > > > wrote:
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > > Thanks for the background.
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > > 20/30: SGTM. My proposal was only focusing
>> to
>> > > avoid
>> > > > >> > >> dangling
>> > > > >> > >> > > > > > > > > transactions if records are added without
>> > > > registered
>> > > > >> > >> > partition.
>> > > > >> > >> > > > --
>> > > > >> > >> > > > > > > Maybe
>> > > > >> > >> > > > > > > > > you can add a few more details to the KIP
>> about
>> > > > this
>> > > > >> > >> scenario
>> > > > >> > >> > > for
>> > > > >> > >> > > > > > > better
>> > > > >> > >> > > > > > > > > documentation purpose?
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > > 40: I think you hit a fair point about race
>> > > > >> conditions
>> > > > >> > or
>> > > > >> > >> > > client
>> > > > >> > >> > > > > bugs
>> > > > >> > >> > > > > > > > > (incorrectly not bumping the epoch). The
>> > > > >> > >> complexity/confusion
>> > > > >> > >> > > for
>> > > > >> > >> > > > > > using
>> > > > >> > >> > > > > > > > > the bumped epoch I see, is mainly for
>> internal
>> > > > >> > debugging,
>> > > > >> > >> ie,
>> > > > >> > >> > > > > > > inspecting
>> > > > >> > >> > > > > > > > > log segment dumps -- it seems harder to
>> reason
>> > > > about
>> > > > >> the
>> > > > >> > >> > system
>> > > > >> > >> > > > for
>> > > > >> > >> > > > > > us
>> > > > >> > >> > > > > > > > > humans. But if we get better guarantees, it
>> > would
>> > > > be
>> > > > >> > >> worth to
>> > > > >> > >> > > use
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > > > > bumped epoch.
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > > 60: as I mentioned already, I don't know the
>> > > broker
>> > > > >> > >> internals
>> > > > >> > >> > > to
>> > > > >> > >> > > > > > > provide
>> > > > >> > >> > > > > > > > > more input. So if nobody else chimes in, we
>> > > should
>> > > > >> just
>> > > > >> > >> move
>> > > > >> > >> > > > > forward
>> > > > >> > >> > > > > > > > > with your proposal.
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > > -Matthias
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > > > On 12/6/22 4:22 PM, Justine Olshan wrote:
>> > > > >> > >> > > > > > > > > > Hi all,
>> > > > >> > >> > > > > > > > > > After Artem's questions about error
>> behavior,
>> > > > I've
>> > > > >> > >> > > re-evaluated
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > > > > > unknown producer ID exception and had some
>> > > > >> discussions
>> > > > >> > >> > > offline.
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > > > I think generally it makes sense to
>> simplify
>> > > > error
>> > > > >> > >> handling
>> > > > >> > >> > > in
>> > > > >> > >> > > > > > cases
>> > > > >> > >> > > > > > > > like
>> > > > >> > >> > > > > > > > > > this and the UNKNOWN_PRODUCER_ID error
>> has a
>> > > > pretty
>> > > > >> > long
>> > > > >> > >> > and
>> > > > >> > >> > > > > > > > complicated
>> > > > >> > >> > > > > > > > > > history. Because of this, I propose
>> adding a
>> > > new
>> > > > >> error
>> > > > >> > >> code
>> > > > >> > >> > > > > > > > > ABORTABLE_ERROR
>> > > > >> > >> > > > > > > > > > that when encountered by new clients
>> (gated
>> > by
>> > > > the
>> > > > >> > >> produce
>> > > > >> > >> > > > > request
>> > > > >> > >> > > > > > > > > version)
>> > > > >> > >> > > > > > > > > > will simply abort the transaction. This
>> > allows
>> > > > the
>> > > > >> > >> server
>> > > > >> > >> > to
>> > > > >> > >> > > > have
>> > > > >> > >> > > > > > > some
>> > > > >> > >> > > > > > > > > say
>> > > > >> > >> > > > > > > > > > in whether the client aborts and makes
>> > handling
>> > > > >> much
>> > > > >> > >> > simpler.
>> > > > >> > >> > > > In
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > > future, we can also use this error in
>> other
>> > > > >> situations
>> > > > >> > >> > where
>> > > > >> > >> > > we
>> > > > >> > >> > > > > > want
>> > > > >> > >> > > > > > > to
>> > > > >> > >> > > > > > > > > > abort the transactions. We can even use on
>> > > other
>> > > > >> apis.
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > > > I've added this to the KIP. Let me know if
>> > > there
>> > > > >> are
>> > > > >> > any
>> > > > >> > >> > > > > questions
>> > > > >> > >> > > > > > or
>> > > > >> > >> > > > > > > > > > issues.
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > > > Justine
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > > > On Fri, Dec 2, 2022 at 10:22 AM Justine
>> > Olshan
>> > > <
>> > > > >> > >> > > > > > jols...@confluent.io
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > > > > wrote:
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > > >> Hey Matthias,
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> 20/30 — Maybe I also didn't express
>> myself
>> > > > >> clearly.
>> > > > >> > For
>> > > > >> > >> > > older
>> > > > >> > >> > > > > > > clients
>> > > > >> > >> > > > > > > > we
>> > > > >> > >> > > > > > > > > >> don't have a way to distinguish between a
>> > > > previous
>> > > > >> > and
>> > > > >> > >> the
>> > > > >> > >> > > > > current
>> > > > >> > >> > > > > > > > > >> transaction since we don't have the epoch
>> > > bump.
>> > > > >> This
>> > > > >> > >> means
>> > > > >> > >> > > > that
>> > > > >> > >> > > > > a
>> > > > >> > >> > > > > > > late
>> > > > >> > >> > > > > > > > > >> message from the previous transaction
>> may be
>> > > > >> added to
>> > > > >> > >> the
>> > > > >> > >> > > new
>> > > > >> > >> > > > > one.
>> > > > >> > >> > > > > > > > With
>> > > > >> > >> > > > > > > > > >> older clients — we can't guarantee this
>> > won't
>> > > > >> happen
>> > > > >> > >> if we
>> > > > >> > >> > > > > already
>> > > > >> > >> > > > > > > > sent
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >> addPartitionsToTxn call (why we make
>> changes
>> > > for
>> > > > >> the
>> > > > >> > >> newer
>> > > > >> > >> > > > > client)
>> > > > >> > >> > > > > > > but
>> > > > >> > >> > > > > > > > > we
>> > > > >> > >> > > > > > > > > >> can at least gate some by ensuring that
>> the
>> > > > >> partition
>> > > > >> > >> has
>> > > > >> > >> > > been
>> > > > >> > >> > > > > > added
>> > > > >> > >> > > > > > > > to
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >> transaction. The rationale here is that
>> > there
>> > > > are
>> > > > >> > >> likely
>> > > > >> > >> > > LESS
>> > > > >> > >> > > > > late
>> > > > >> > >> > > > > > > > > arrivals
>> > > > >> > >> > > > > > > > > >> as time goes on, so hopefully most late
>> > > arrivals
>> > > > >> will
>> > > > >> > >> come
>> > > > >> > >> > > in
>> > > > >> > >> > > > > > BEFORE
>> > > > >> > >> > > > > > > > the
>> > > > >> > >> > > > > > > > > >> addPartitionsToTxn call. Those that
>> arrive
>> > > > before
>> > > > >> > will
>> > > > >> > >> be
>> > > > >> > >> > > > > properly
>> > > > >> > >> > > > > > > > gated
>> > > > >> > >> > > > > > > > > >> with the describeTransactions approach.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> If we take the approach you suggested,
>> ANY
>> > > late
>> > > > >> > arrival
>> > > > >> > >> > > from a
>> > > > >> > >> > > > > > > > previous
>> > > > >> > >> > > > > > > > > >> transaction will be added. And we don't
>> want
>> > > > >> that. I
>> > > > >> > >> also
>> > > > >> > >> > > > don't
>> > > > >> > >> > > > > > see
>> > > > >> > >> > > > > > > > any
>> > > > >> > >> > > > > > > > > >> benefit in sending addPartitionsToTxn
>> over
>> > the
>> > > > >> > >> > describeTxns
>> > > > >> > >> > > > > call.
>> > > > >> > >> > > > > > > They
>> > > > >> > >> > > > > > > > > will
>> > > > >> > >> > > > > > > > > >> both be one extra RPC to the Txn
>> > coordinator.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> To be clear — newer clients will use
>> > > > >> > addPartitionsToTxn
>> > > > >> > >> > > > instead
>> > > > >> > >> > > > > of
>> > > > >> > >> > > > > > > the
>> > > > >> > >> > > > > > > > > >> DescribeTxns.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> 40)
>> > > > >> > >> > > > > > > > > >> My concern is that if we have some delay
>> in
>> > > the
>> > > > >> > client
>> > > > >> > >> to
>> > > > >> > >> > > bump
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > > > > epoch,
>> > > > >> > >> > > > > > > > > >> it could continue to send epoch 73 and
>> those
>> > > > >> records
>> > > > >> > >> would
>> > > > >> > >> > > not
>> > > > >> > >> > > > > be
>> > > > >> > >> > > > > > > > > fenced.
>> > > > >> > >> > > > > > > > > >> Perhaps this is not an issue if we don't
>> > allow
>> > > > the
>> > > > >> > next
>> > > > >> > >> > > > produce
>> > > > >> > >> > > > > to
>> > > > >> > >> > > > > > > go
>> > > > >> > >> > > > > > > > > >> through before the EndTxn request
>> returns.
>> > I'm
>> > > > >> also
>> > > > >> > >> > thinking
>> > > > >> > >> > > > > about
>> > > > >> > >> > > > > > > > > cases of
>> > > > >> > >> > > > > > > > > >> failure. I will need to think on this a
>> bit.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> I wasn't sure if it was that confusing.
>> But
>> > if
>> > > > we
>> > > > >> > >> think it
>> > > > >> > >> > > is,
>> > > > >> > >> > > > > we
>> > > > >> > >> > > > > > > can
>> > > > >> > >> > > > > > > > > >> investigate other ways.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> 60)
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> I'm not sure these are the same
>> purgatories
>> > > > since
>> > > > >> one
>> > > > >> > >> is a
>> > > > >> > >> > > > > produce
>> > > > >> > >> > > > > > > > > >> purgatory (I was planning on using a
>> > callback
>> > > > >> rather
>> > > > >> > >> than
>> > > > >> > >> > > > > > purgatory)
>> > > > >> > >> > > > > > > > and
>> > > > >> > >> > > > > > > > > >> the other is simply a request to append
>> to
>> > the
>> > > > >> log.
>> > > > >> > Not
>> > > > >> > >> > sure
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > have
>> > > > >> > >> > > > > > > > any
>> > > > >> > >> > > > > > > > > >> structure here for ordering, but my
>> > > > understanding
>> > > > >> is
>> > > > >> > >> that
>> > > > >> > >> > > the
>> > > > >> > >> > > > > > broker
>> > > > >> > >> > > > > > > > > could
>> > > > >> > >> > > > > > > > > >> handle the write request before it hears
>> > back
>> > > > from
>> > > > >> > the
>> > > > >> > >> Txn
>> > > > >> > >> > > > > > > > Coordinator.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> Let me know if I misunderstood something
>> or
>> > > > >> something
>> > > > >> > >> was
>> > > > >> > >> > > > > unclear.
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> Justine
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >> On Thu, Dec 1, 2022 at 12:15 PM Matthias
>> J.
>> > > Sax
>> > > > <
>> > > > >> > >> > > > > mj...@apache.org
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > > > > > wrote:
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >>> Thanks for the details Justine!
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>> 20)
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> The client side change for 2 is
>> removing
>> > the
>> > > > >> > >> > addPartitions
>> > > > >> > >> > > > to
>> > > > >> > >> > > > > > > > > >>> transaction
>> > > > >> > >> > > > > > > > > >>>> call. We don't need to make this from
>> the
>> > > > >> producer
>> > > > >> > to
>> > > > >> > >> > the
>> > > > >> > >> > > > txn
>> > > > >> > >> > > > > > > > > >>> coordinator,
>> > > > >> > >> > > > > > > > > >>>> only server side.
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> I think I did not express myself
>> clearly. I
>> > > > >> > understand
>> > > > >> > >> > that
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > can
>> > > > >> > >> > > > > > > > (and
>> > > > >> > >> > > > > > > > > >>> should) change the producer to not send
>> the
>> > > > >> > >> > `addPartitions`
>> > > > >> > >> > > > > > request
>> > > > >> > >> > > > > > > > any
>> > > > >> > >> > > > > > > > > >>> longer. But I don't thinks it's
>> requirement
>> > > to
>> > > > >> > change
>> > > > >> > >> the
>> > > > >> > >> > > > > broker?
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> What I am trying to say is: as a
>> safe-guard
>> > > and
>> > > > >> > >> > improvement
>> > > > >> > >> > > > for
>> > > > >> > >> > > > > > > older
>> > > > >> > >> > > > > > > > > >>> producers, the partition leader can just
>> > send
>> > > > the
>> > > > >> > >> > > > > `addPartitions`
>> > > > >> > >> > > > > > > > > >>> request to the TX-coordinator in any
>> case
>> > --
>> > > if
>> > > > >> the
>> > > > >> > >> old
>> > > > >> > >> > > > > producer
>> > > > >> > >> > > > > > > > > >>> correctly did send the `addPartition`
>> > request
>> > > > to
>> > > > >> the
>> > > > >> > >> > > > > > TX-coordinator
>> > > > >> > >> > > > > > > > > >>> already, the TX-coordinator can just
>> > "ignore"
>> > > > is
>> > > > >> as
>> > > > >> > >> > > > idempotent.
>> > > > >> > >> > > > > > > > > However,
>> > > > >> > >> > > > > > > > > >>> if the old producer has a bug and did
>> > forget
>> > > to
>> > > > >> sent
>> > > > >> > >> the
>> > > > >> > >> > > > > > > > `addPartition`
>> > > > >> > >> > > > > > > > > >>> request, we would now ensure that the
>> > > partition
>> > > > >> is
>> > > > >> > >> indeed
>> > > > >> > >> > > > added
>> > > > >> > >> > > > > > to
>> > > > >> > >> > > > > > > > the
>> > > > >> > >> > > > > > > > > >>> TX and thus fix a potential producer bug
>> > > (even
>> > > > >> if we
>> > > > >> > >> > don't
>> > > > >> > >> > > > get
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > >>> fencing via the bump epoch). -- It
>> seems to
>> > > be
>> > > > a
>> > > > >> > good
>> > > > >> > >> > > > > > improvement?
>> > > > >> > >> > > > > > > Or
>> > > > >> > >> > > > > > > > > is
>> > > > >> > >> > > > > > > > > >>> there a reason to not do this?
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>> 30)
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> Transaction is ongoing = partition was
>> > added
>> > > > to
>> > > > >> > >> > > transaction
>> > > > >> > >> > > > > via
>> > > > >> > >> > > > > > > > > >>>> addPartitionsToTxn. We check this with
>> the
>> > > > >> > >> > > > > DescribeTransactions
>> > > > >> > >> > > > > > > > call.
>> > > > >> > >> > > > > > > > > >>> Let
>> > > > >> > >> > > > > > > > > >>>> me know if this wasn't sufficiently
>> > > explained
>> > > > >> here:
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> If we do what I propose in (20), we
>> don't
>> > > > really
>> > > > >> > need
>> > > > >> > >> to
>> > > > >> > >> > > make
>> > > > >> > >> > > > > > this
>> > > > >> > >> > > > > > > > > >>> `DescribeTransaction` call, as the
>> > partition
>> > > > >> leader
>> > > > >> > >> adds
>> > > > >> > >> > > the
>> > > > >> > >> > > > > > > > partition
>> > > > >> > >> > > > > > > > > >>> for older clients and we get this check
>> for
>> > > > free.
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>> 40)
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> The idea here is that if any messages
>> > > somehow
>> > > > >> come
>> > > > >> > in
>> > > > >> > >> > > before
>> > > > >> > >> > > > > we
>> > > > >> > >> > > > > > > get
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >>> new
>> > > > >> > >> > > > > > > > > >>>> epoch to the producer, they will be
>> > fenced.
>> > > > >> > However,
>> > > > >> > >> if
>> > > > >> > >> > we
>> > > > >> > >> > > > > don't
>> > > > >> > >> > > > > > > > think
>> > > > >> > >> > > > > > > > > >>> this
>> > > > >> > >> > > > > > > > > >>>> is necessary, it can be discussed
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> I agree that we should have epoch
>> fencing.
>> > My
>> > > > >> > >> question is
>> > > > >> > >> > > > > > > different:
>> > > > >> > >> > > > > > > > > >>> Assume we are at epoch 73, and we have
>> an
>> > > > ongoing
>> > > > >> > >> > > > transaction,
>> > > > >> > >> > > > > > that
>> > > > >> > >> > > > > > > > is
>> > > > >> > >> > > > > > > > > >>> committed. It seems natural to write the
>> > > > "prepare
>> > > > >> > >> commit"
>> > > > >> > >> > > > > marker
>> > > > >> > >> > > > > > > and
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >>> `WriteTxMarkerRequest` both with epoch
>> 73,
>> > > too,
>> > > > >> as
>> > > > >> > it
>> > > > >> > >> > > belongs
>> > > > >> > >> > > > > to
>> > > > >> > >> > > > > > > the
>> > > > >> > >> > > > > > > > > >>> current transaction. Of course, we now
>> also
>> > > > bump
>> > > > >> the
>> > > > >> > >> > epoch
>> > > > >> > >> > > > and
>> > > > >> > >> > > > > > > expect
>> > > > >> > >> > > > > > > > > >>> the next requests to have epoch 74, and
>> > would
>> > > > >> reject
>> > > > >> > >> an
>> > > > >> > >> > > > request
>> > > > >> > >> > > > > > > with
>> > > > >> > >> > > > > > > > > >>> epoch 73, as the corresponding TX for
>> epoch
>> > > 73
>> > > > >> was
>> > > > >> > >> > already
>> > > > >> > >> > > > > > > committed.
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> It seems you propose to write the
>> "prepare
>> > > > commit
>> > > > >> > >> marker"
>> > > > >> > >> > > and
>> > > > >> > >> > > > > > > > > >>> `WriteTxMarkerRequest` with epoch 74
>> > though,
>> > > > what
>> > > > >> > >> would
>> > > > >> > >> > > work,
>> > > > >> > >> > > > > but
>> > > > >> > >> > > > > > > it
>> > > > >> > >> > > > > > > > > >>> seems confusing. Is there a reason why
>> we
>> > > would
>> > > > >> use
>> > > > >> > >> the
>> > > > >> > >> > > > bumped
>> > > > >> > >> > > > > > > epoch
>> > > > >> > >> > > > > > > > 74
>> > > > >> > >> > > > > > > > > >>> instead of the current epoch 73?
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>> 60)
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> When we are checking if the
>> transaction is
>> > > > >> ongoing,
>> > > > >> > >> we
>> > > > >> > >> > > need
>> > > > >> > >> > > > to
>> > > > >> > >> > > > > > > make
>> > > > >> > >> > > > > > > > a
>> > > > >> > >> > > > > > > > > >>> round
>> > > > >> > >> > > > > > > > > >>>> trip from the leader partition to the
>> > > > >> transaction
>> > > > >> > >> > > > coordinator.
>> > > > >> > >> > > > > > In
>> > > > >> > >> > > > > > > > the
>> > > > >> > >> > > > > > > > > >>> time
>> > > > >> > >> > > > > > > > > >>>> we are waiting for this message to come
>> > > back,
>> > > > in
>> > > > >> > >> theory
>> > > > >> > >> > we
>> > > > >> > >> > > > > could
>> > > > >> > >> > > > > > > > have
>> > > > >> > >> > > > > > > > > >>> sent
>> > > > >> > >> > > > > > > > > >>>> a commit/abort call that would make the
>> > > > original
>> > > > >> > >> result
>> > > > >> > >> > of
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > > check
>> > > > >> > >> > > > > > > > > >>> out of
>> > > > >> > >> > > > > > > > > >>>> date. That is why we can check the
>> leader
>> > > > state
>> > > > >> > >> before
>> > > > >> > >> > we
>> > > > >> > >> > > > > write
>> > > > >> > >> > > > > > to
>> > > > >> > >> > > > > > > > the
>> > > > >> > >> > > > > > > > > >>> log.
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> Thanks. Got it.
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> However, is this really an issue? We put
>> > the
>> > > > >> produce
>> > > > >> > >> > > request
>> > > > >> > >> > > > in
>> > > > >> > >> > > > > > > > > >>> purgatory, so how could we process the
>> > > > >> > >> > > > `WriteTxnMarkerRequest`
>> > > > >> > >> > > > > > > first?
>> > > > >> > >> > > > > > > > > >>> Don't we need to put the
>> > > > `WriteTxnMarkerRequest`
>> > > > >> > into
>> > > > >> > >> > > > > purgatory,
>> > > > >> > >> > > > > > > too,
>> > > > >> > >> > > > > > > > > >>> for this case, and process both request
>> > > > in-order?
>> > > > >> > >> (Again,
>> > > > >> > >> > > my
>> > > > >> > >> > > > > > broker
>> > > > >> > >> > > > > > > > > >>> knowledge is limited and maybe we don't
>> > > > maintain
>> > > > >> > >> request
>> > > > >> > >> > > > order
>> > > > >> > >> > > > > > for
>> > > > >> > >> > > > > > > > this
>> > > > >> > >> > > > > > > > > >>> case, what seems to be an issue IMHO,
>> and I
>> > > am
>> > > > >> > >> wondering
>> > > > >> > >> > if
>> > > > >> > >> > > > > > > changing
>> > > > >> > >> > > > > > > > > >>> request handling to preserve order for
>> this
>> > > > case
>> > > > >> > >> might be
>> > > > >> > >> > > the
>> > > > >> > >> > > > > > > cleaner
>> > > > >> > >> > > > > > > > > >>> solution?)
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> -Matthias
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>> On 11/30/22 3:28 PM, Artem Livshits
>> wrote:
>> > > > >> > >> > > > > > > > > >>>> Hi Justine,
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> I think the interesting part is not in
>> > this
>> > > > >> logic
>> > > > >> > >> > (because
>> > > > >> > >> > > > it
>> > > > >> > >> > > > > > > tries
>> > > > >> > >> > > > > > > > to
>> > > > >> > >> > > > > > > > > >>>> figure out when UNKNOWN_PRODUCER_ID is
>> > > > retriable
>> > > > >> > and
>> > > > >> > >> if
>> > > > >> > >> > > it's
>> > > > >> > >> > > > > > > > > retryable,
>> > > > >> > >> > > > > > > > > >>>> it's definitely not fatal), but what
>> > happens
>> > > > >> when
>> > > > >> > >> this
>> > > > >> > >> > > logic
>> > > > >> > >> > > > > > > doesn't
>> > > > >> > >> > > > > > > > > >>> return
>> > > > >> > >> > > > > > > > > >>>> 'true' and falls through.  In the old
>> > > clients
>> > > > it
>> > > > >> > >> seems
>> > > > >> > >> > to
>> > > > >> > >> > > be
>> > > > >> > >> > > > > > > fatal,
>> > > > >> > >> > > > > > > > if
>> > > > >> > >> > > > > > > > > >>> we
>> > > > >> > >> > > > > > > > > >>>> keep the behavior in the new clients,
>> I'd
>> > > > >> expect it
>> > > > >> > >> > would
>> > > > >> > >> > > be
>> > > > >> > >> > > > > > fatal
>> > > > >> > >> > > > > > > > as
>> > > > >> > >> > > > > > > > > >>> well.
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> -Artem
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>> On Tue, Nov 29, 2022 at 11:57 AM
>> Justine
>> > > > Olshan
>> > > > >> > >> > > > > > > > > >>>> <jols...@confluent.io.invalid> wrote:
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>>> Hi Artem and Jeff,
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> Thanks for taking a look and sorry for
>> > the
>> > > > slow
>> > > > >> > >> > response.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> You both mentioned the change to
>> handle
>> > > > >> > >> > > UNKNOWN_PRODUCER_ID
>> > > > >> > >> > > > > > > errors.
>> > > > >> > >> > > > > > > > > To
>> > > > >> > >> > > > > > > > > >>> be
>> > > > >> > >> > > > > > > > > >>>>> clear — this error code will only be
>> sent
>> > > > again
>> > > > >> > when
>> > > > >> > >> > the
>> > > > >> > >> > > > > > client's
>> > > > >> > >> > > > > > > > > >>> request
>> > > > >> > >> > > > > > > > > >>>>> version is high enough to ensure we
>> > handle
>> > > it
>> > > > >> > >> > correctly.
>> > > > >> > >> > > > > > > > > >>>>> The current (Java) client handles
>> this by
>> > > the
>> > > > >> > >> following
>> > > > >> > >> > > > > > (somewhat
>> > > > >> > >> > > > > > > > > long)
>> > > > >> > >> > > > > > > > > >>>>> code snippet:
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> // An UNKNOWN_PRODUCER_ID means that
>> we
>> > > have
>> > > > >> lost
>> > > > >> > >> the
>> > > > >> > >> > > > > producer
>> > > > >> > >> > > > > > > > state
>> > > > >> > >> > > > > > > > > >>> on the
>> > > > >> > >> > > > > > > > > >>>>> broker. Depending on the log start
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> // offset, we may want to retry
>> these, as
>> > > > >> > described
>> > > > >> > >> for
>> > > > >> > >> > > > each
>> > > > >> > >> > > > > > case
>> > > > >> > >> > > > > > > > > >>> below. If
>> > > > >> > >> > > > > > > > > >>>>> none of those apply, then for the
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> // idempotent producer, we will
>> locally
>> > > bump
>> > > > >> the
>> > > > >> > >> epoch
>> > > > >> > >> > > and
>> > > > >> > >> > > > > > reset
>> > > > >> > >> > > > > > > > the
>> > > > >> > >> > > > > > > > > >>>>> sequence numbers of in-flight batches
>> > from
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> // sequence 0, then retry the failed
>> > batch,
>> > > > >> which
>> > > > >> > >> > should
>> > > > >> > >> > > > now
>> > > > >> > >> > > > > > > > succeed.
>> > > > >> > >> > > > > > > > > >>> For
>> > > > >> > >> > > > > > > > > >>>>> the transactional producer, allow the
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> // batch to fail. When processing the
>> > > failed
>> > > > >> > batch,
>> > > > >> > >> we
>> > > > >> > >> > > will
>> > > > >> > >> > > > > > > > > transition
>> > > > >> > >> > > > > > > > > >>> to
>> > > > >> > >> > > > > > > > > >>>>> an abortable error and set a flag
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> // indicating that we need to bump the
>> > > epoch
>> > > > >> (if
>> > > > >> > >> > > supported
>> > > > >> > >> > > > by
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > >>> broker).
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> if (error ==
>> > Errors.*UNKNOWN_PRODUCER_ID*)
>> > > {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       if (response.logStartOffset ==
>> -1)
>> > {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // We don't know the log
>> start
>> > > > offset
>> > > > >> > with
>> > > > >> > >> > this
>> > > > >> > >> > > > > > > response.
>> > > > >> > >> > > > > > > > > We
>> > > > >> > >> > > > > > > > > >>> should
>> > > > >> > >> > > > > > > > > >>>>> just retry the request until we get
>> it.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // The UNKNOWN_PRODUCER_ID
>> > error
>> > > > code
>> > > > >> > was
>> > > > >> > >> > added
>> > > > >> > >> > > > > along
>> > > > >> > >> > > > > > > > with
>> > > > >> > >> > > > > > > > > >>> the new
>> > > > >> > >> > > > > > > > > >>>>> ProduceResponse which includes the
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // logStartOffset. So the
>> '-1'
>> > > > >> sentinel
>> > > > >> > is
>> > > > >> > >> > not
>> > > > >> > >> > > > for
>> > > > >> > >> > > > > > > > backward
>> > > > >> > >> > > > > > > > > >>>>> compatibility. Instead, it is possible
>> > for
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // a broker to not know the
>> > > > >> > >> logStartOffset at
>> > > > >> > >> > > > when
>> > > > >> > >> > > > > it
>> > > > >> > >> > > > > > > is
>> > > > >> > >> > > > > > > > > >>> returning
>> > > > >> > >> > > > > > > > > >>>>> the response because the partition
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // may have moved away from
>> the
>> > > > >> broker
>> > > > >> > >> from
>> > > > >> > >> > the
>> > > > >> > >> > > > > time
>> > > > >> > >> > > > > > > the
>> > > > >> > >> > > > > > > > > >>> error was
>> > > > >> > >> > > > > > > > > >>>>> initially raised to the time the
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // response was being
>> > > constructed.
>> > > > In
>> > > > >> > >> these
>> > > > >> > >> > > > cases,
>> > > > >> > >> > > > > we
>> > > > >> > >> > > > > > > > > should
>> > > > >> > >> > > > > > > > > >>> just
>> > > > >> > >> > > > > > > > > >>>>> retry the request: we are guaranteed
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // to eventually get a
>> > > > logStartOffset
>> > > > >> > once
>> > > > >> > >> > > things
>> > > > >> > >> > > > > > > settle
>> > > > >> > >> > > > > > > > > down.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           return true;
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       }
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       if
>> (batch.sequenceHasBeenReset()) {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // When the first inflight
>> > batch
>> > > > >> fails
>> > > > >> > >> due to
>> > > > >> > >> > > the
>> > > > >> > >> > > > > > > > > truncation
>> > > > >> > >> > > > > > > > > >>> case,
>> > > > >> > >> > > > > > > > > >>>>> then the sequences of all the other
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // in flight batches would
>> have
>> > > > been
>> > > > >> > >> > restarted
>> > > > >> > >> > > > from
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > >>> beginning.
>> > > > >> > >> > > > > > > > > >>>>> However, when those responses
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // come back from the
>> broker,
>> > > they
>> > > > >> would
>> > > > >> > >> also
>> > > > >> > >> > > > come
>> > > > >> > >> > > > > > with
>> > > > >> > >> > > > > > > > an
>> > > > >> > >> > > > > > > > > >>>>> UNKNOWN_PRODUCER_ID error. In this
>> case,
>> > we
>> > > > >> should
>> > > > >> > >> not
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // reset the sequence
>> numbers
>> > to
>> > > > the
>> > > > >> > >> > beginning.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           return true;
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       } else if
>> > > > >> > >> > > > > (lastAckedOffset(batch.topicPartition).orElse(
>> > > > >> > >> > > > > > > > > >>>>> *NO_LAST_ACKED_SEQUENCE_NUMBER*) <
>> > > > >> > >> > > > response.logStartOffset) {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // The head of the log has
>> been
>> > > > >> removed,
>> > > > >> > >> > > probably
>> > > > >> > >> > > > > due
>> > > > >> > >> > > > > > > to
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >>>>> retention time elapsing. In this case,
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // we expect to lose the
>> > producer
>> > > > >> state.
>> > > > >> > >> For
>> > > > >> > >> > > the
>> > > > >> > >> > > > > > > > > transactional
>> > > > >> > >> > > > > > > > > >>>>> producer, reset the sequences of all
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // inflight batches to be
>> from
>> > > the
>> > > > >> > >> beginning
>> > > > >> > >> > > and
>> > > > >> > >> > > > > > retry
>> > > > >> > >> > > > > > > > > them,
>> > > > >> > >> > > > > > > > > >>> so
>> > > > >> > >> > > > > > > > > >>>>> that the transaction does not need to
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // be aborted. For the
>> > idempotent
>> > > > >> > >> producer,
>> > > > >> > >> > > bump
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > > > epoch
>> > > > >> > >> > > > > > > > > to
>> > > > >> > >> > > > > > > > > >>> avoid
>> > > > >> > >> > > > > > > > > >>>>> reusing (sequence, epoch) pairs
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           if (isTransactional()) {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > >
>> > > > txnPartitionMap.startSequencesAtBeginning(batch.topicPartition,
>> > > > >> > >> > > > > > > > > >>>>> this.producerIdAndEpoch);
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           } else {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > >
>> requestEpochBumpForPartition(batch.topicPartition);
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           }
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           return true;
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       }
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       if (!isTransactional()) {
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // For the idempotent
>> producer,
>> > > > >> always
>> > > > >> > >> retry
>> > > > >> > >> > > > > > > > > >>> UNKNOWN_PRODUCER_ID
>> > > > >> > >> > > > > > > > > >>>>> errors. If the batch has the current
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           // producer ID and epoch,
>> > > request a
>> > > > >> bump
>> > > > >> > >> of
>> > > > >> > >> > the
>> > > > >> > >> > > > > > epoch.
>> > > > >> > >> > > > > > > > > >>> Otherwise
>> > > > >> > >> > > > > > > > > >>>>> just retry the produce.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > >  requestEpochBumpForPartition(batch.topicPartition);
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>           return true;
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>       }
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> }
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> I was considering keeping this
>> behavior —
>> > > but
>> > > > >> am
>> > > > >> > >> open
>> > > > >> > >> > to
>> > > > >> > >> > > > > > > > simplifying
>> > > > >> > >> > > > > > > > > >>> it.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> We are leaving changes to older
>> clients
>> > off
>> > > > the
>> > > > >> > >> table
>> > > > >> > >> > > here
>> > > > >> > >> > > > > > since
>> > > > >> > >> > > > > > > it
>> > > > >> > >> > > > > > > > > >>> caused
>> > > > >> > >> > > > > > > > > >>>>> many issues for clients in the past.
>> > > > Previously
>> > > > >> > this
>> > > > >> > >> > was
>> > > > >> > >> > > a
>> > > > >> > >> > > > > > fatal
>> > > > >> > >> > > > > > > > > error
>> > > > >> > >> > > > > > > > > >>> and
>> > > > >> > >> > > > > > > > > >>>>> we didn't have the mechanisms in
>> place to
>> > > > >> detect
>> > > > >> > >> when
>> > > > >> > >> > > this
>> > > > >> > >> > > > > was
>> > > > >> > >> > > > > > a
>> > > > >> > >> > > > > > > > > >>> legitimate
>> > > > >> > >> > > > > > > > > >>>>> case vs some bug or gap in the
>> protocol.
>> > > > >> Ensuring
>> > > > >> > >> each
>> > > > >> > >> > > > > > > transaction
>> > > > >> > >> > > > > > > > > has
>> > > > >> > >> > > > > > > > > >>> its
>> > > > >> > >> > > > > > > > > >>>>> own epoch should close this gap.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> And to address Jeff's second point:
>> > > > >> > >> > > > > > > > > >>>>> *does the typical produce request path
>> > > append
>> > > > >> > >> records
>> > > > >> > >> > to
>> > > > >> > >> > > > > local
>> > > > >> > >> > > > > > > log
>> > > > >> > >> > > > > > > > > >>> along*
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> *with the currentTxnFirstOffset
>> > > information?
>> > > > I
>> > > > >> > would
>> > > > >> > >> > like
>> > > > >> > >> > > > to
>> > > > >> > >> > > > > > > > > >>> understand*
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> *when the field is written to disk.*
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> Yes, the first produce request
>> populates
>> > > this
>> > > > >> > field
>> > > > >> > >> and
>> > > > >> > >> > > > > writes
>> > > > >> > >> > > > > > > the
>> > > > >> > >> > > > > > > > > >>> offset
>> > > > >> > >> > > > > > > > > >>>>> as part of the record batch and also
>> to
>> > the
>> > > > >> > producer
>> > > > >> > >> > > state
>> > > > >> > >> > > > > > > > snapshot.
>> > > > >> > >> > > > > > > > > >>> When
>> > > > >> > >> > > > > > > > > >>>>> we reload the records on restart
>> and/or
>> > > > >> > >> reassignment,
>> > > > >> > >> > we
>> > > > >> > >> > > > > > > repopulate
>> > > > >> > >> > > > > > > > > >>> this
>> > > > >> > >> > > > > > > > > >>>>> field with the snapshot from disk
>> along
>> > > with
>> > > > >> the
>> > > > >> > >> rest
>> > > > >> > >> > of
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > > > producer
>> > > > >> > >> > > > > > > > > >>>>> state.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> Let me know if there are further
>> comments
>> > > > >> and/or
>> > > > >> > >> > > questions.
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> Thanks,
>> > > > >> > >> > > > > > > > > >>>>> Justine
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>> On Tue, Nov 22, 2022 at 9:00 PM Jeff
>> Kim
>> > > > >> > >> > > > > > > > > <jeff....@confluent.io.invalid
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>>> wrote:
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>>> Hi Justine,
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> Thanks for the KIP! I have two
>> > questions:
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> 1) For new clients, we can once again
>> > > return
>> > > > >> an
>> > > > >> > >> error
>> > > > >> > >> > > > > > > > > >>> UNKNOWN_PRODUCER_ID
>> > > > >> > >> > > > > > > > > >>>>>> for sequences
>> > > > >> > >> > > > > > > > > >>>>>> that are non-zero when there is no
>> > > producer
>> > > > >> state
>> > > > >> > >> > > present
>> > > > >> > >> > > > on
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > >>> server.
>> > > > >> > >> > > > > > > > > >>>>>> This will indicate we missed the 0
>> > > sequence
>> > > > >> and
>> > > > >> > we
>> > > > >> > >> > don't
>> > > > >> > >> > > > yet
>> > > > >> > >> > > > > > > want
>> > > > >> > >> > > > > > > > to
>> > > > >> > >> > > > > > > > > >>>>> write
>> > > > >> > >> > > > > > > > > >>>>>> to the log.
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> I would like to understand the
>> current
>> > > > >> behavior
>> > > > >> > to
>> > > > >> > >> > > handle
>> > > > >> > >> > > > > > older
>> > > > >> > >> > > > > > > > > >>> clients,
>> > > > >> > >> > > > > > > > > >>>>>> and if there are any changes we are
>> > > making.
>> > > > >> Maybe
>> > > > >> > >> I'm
>> > > > >> > >> > > > > missing
>> > > > >> > >> > > > > > > > > >>> something,
>> > > > >> > >> > > > > > > > > >>>>>> but we would want to identify
>> whether we
>> > > > >> missed
>> > > > >> > >> the 0
>> > > > >> > >> > > > > sequence
>> > > > >> > >> > > > > > > for
>> > > > >> > >> > > > > > > > > >>> older
>> > > > >> > >> > > > > > > > > >>>>>> clients, no?
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> 2) Upon returning from the
>> transaction
>> > > > >> > >> coordinator, we
>> > > > >> > >> > > can
>> > > > >> > >> > > > > set
>> > > > >> > >> > > > > > > the
>> > > > >> > >> > > > > > > > > >>>>>> transaction
>> > > > >> > >> > > > > > > > > >>>>>> as ongoing on the leader by
>> populating
>> > > > >> > >> > > > currentTxnFirstOffset
>> > > > >> > >> > > > > > > > > >>>>>> through the typical produce request
>> > > > handling.
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> does the typical produce request path
>> > > append
>> > > > >> > >> records
>> > > > >> > >> > to
>> > > > >> > >> > > > > local
>> > > > >> > >> > > > > > > log
>> > > > >> > >> > > > > > > > > >>> along
>> > > > >> > >> > > > > > > > > >>>>>> with the currentTxnFirstOffset
>> > > information?
>> > > > I
>> > > > >> > would
>> > > > >> > >> > like
>> > > > >> > >> > > > to
>> > > > >> > >> > > > > > > > > understand
>> > > > >> > >> > > > > > > > > >>>>>> when the field is written to disk.
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> Thanks,
>> > > > >> > >> > > > > > > > > >>>>>> Jeff
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>> On Tue, Nov 22, 2022 at 4:44 PM Artem
>> > > > Livshits
>> > > > >> > >> > > > > > > > > >>>>>> <alivsh...@confluent.io.invalid>
>> wrote:
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>> Hi Justine,
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>> Thank you for the KIP.  I have one
>> > > > question.
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>> 5) For new clients, we can once
>> again
>> > > > return
>> > > > >> an
>> > > > >> > >> error
>> > > > >> > >> > > > > > > > > >>>>> UNKNOWN_PRODUCER_ID
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>> I believe we had problems in the
>> past
>> > > with
>> > > > >> > >> returning
>> > > > >> > >> > > > > > > > > >>>>> UNKNOWN_PRODUCER_ID
>> > > > >> > >> > > > > > > > > >>>>>>> because it was considered fatal and
>> > > > required
>> > > > >> > >> client
>> > > > >> > >> > > > > restart.
>> > > > >> > >> > > > > > > It
>> > > > >> > >> > > > > > > > > >>> would
>> > > > >> > >> > > > > > > > > >>>>> be
>> > > > >> > >> > > > > > > > > >>>>>>> good to spell out the new client
>> > behavior
>> > > > >> when
>> > > > >> > it
>> > > > >> > >> > > > receives
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > error.
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>> -Artem
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>> On Tue, Nov 22, 2022 at 10:00 AM
>> > Justine
>> > > > >> Olshan
>> > > > >> > >> > > > > > > > > >>>>>>> <jols...@confluent.io.invalid>
>> wrote:
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> Thanks for taking a look Matthias.
>> > I've
>> > > > >> tried
>> > > > >> > to
>> > > > >> > >> > > answer
>> > > > >> > >> > > > > your
>> > > > >> > >> > > > > > > > > >>>>> questions
>> > > > >> > >> > > > > > > > > >>>>>>>> below:
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> 10)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> Right — so the hanging transaction
>> > only
>> > > > >> occurs
>> > > > >> > >> when
>> > > > >> > >> > we
>> > > > >> > >> > > > > have
>> > > > >> > >> > > > > > a
>> > > > >> > >> > > > > > > > late
>> > > > >> > >> > > > > > > > > >>>>>>> message
>> > > > >> > >> > > > > > > > > >>>>>>>> come in and the partition is never
>> > added
>> > > > to
>> > > > >> a
>> > > > >> > >> > > > transaction
>> > > > >> > >> > > > > > > again.
>> > > > >> > >> > > > > > > > > If
>> > > > >> > >> > > > > > > > > >>>>> we
>> > > > >> > >> > > > > > > > > >>>>>>>> never add the partition to a
>> > > transaction,
>> > > > we
>> > > > >> > will
>> > > > >> > >> > > never
>> > > > >> > >> > > > > > write
>> > > > >> > >> > > > > > > a
>> > > > >> > >> > > > > > > > > >>>>> marker
>> > > > >> > >> > > > > > > > > >>>>>>> and
>> > > > >> > >> > > > > > > > > >>>>>>>> never advance the LSO.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> If we do end up adding the
>> partition
>> > to
>> > > > the
>> > > > >> > >> > > transaction
>> > > > >> > >> > > > (I
>> > > > >> > >> > > > > > > > suppose
>> > > > >> > >> > > > > > > > > >>>>> this
>> > > > >> > >> > > > > > > > > >>>>>>> can
>> > > > >> > >> > > > > > > > > >>>>>>>> happen before or after the late
>> > message
>> > > > >> comes
>> > > > >> > in)
>> > > > >> > >> > then
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > will
>> > > > >> > >> > > > > > > > > >>>>> include
>> > > > >> > >> > > > > > > > > >>>>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>>> late message in the next
>> (incorrect)
>> > > > >> > transaction.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> So perhaps it is clearer to make
>> the
>> > > > >> > distinction
>> > > > >> > >> > > between
>> > > > >> > >> > > > > > > > messages
>> > > > >> > >> > > > > > > > > >>>>> that
>> > > > >> > >> > > > > > > > > >>>>>>>> eventually get added to the
>> > transaction
>> > > > (but
>> > > > >> > the
>> > > > >> > >> > wrong
>> > > > >> > >> > > > > one)
>> > > > >> > >> > > > > > or
>> > > > >> > >> > > > > > > > > >>>>> messages
>> > > > >> > >> > > > > > > > > >>>>>>>> that never get added and become
>> > hanging.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> 20)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> The client side change for 2 is
>> > removing
>> > > > the
>> > > > >> > >> > > > addPartitions
>> > > > >> > >> > > > > > to
>> > > > >> > >> > > > > > > > > >>>>>> transaction
>> > > > >> > >> > > > > > > > > >>>>>>>> call. We don't need to make this
>> from
>> > > the
>> > > > >> > >> producer
>> > > > >> > >> > to
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > txn
>> > > > >> > >> > > > > > > > > >>>>>>> coordinator,
>> > > > >> > >> > > > > > > > > >>>>>>>> only server side.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> In my opinion, the issue with the
>> > > > >> > >> addPartitionsToTxn
>> > > > >> > >> > > > call
>> > > > >> > >> > > > > > for
>> > > > >> > >> > > > > > > > > older
>> > > > >> > >> > > > > > > > > >>>>>>> clients
>> > > > >> > >> > > > > > > > > >>>>>>>> is that we don't have the epoch
>> bump,
>> > so
>> > > > we
>> > > > >> > don't
>> > > > >> > >> > know
>> > > > >> > >> > > > if
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > >>> message
>> > > > >> > >> > > > > > > > > >>>>>>>> belongs to the previous
>> transaction or
>> > > > this
>> > > > >> > one.
>> > > > >> > >> We
>> > > > >> > >> > > need
>> > > > >> > >> > > > > to
>> > > > >> > >> > > > > > > > check
>> > > > >> > >> > > > > > > > > if
>> > > > >> > >> > > > > > > > > >>>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>>> partition has been added to this
>> > > > >> transaction.
>> > > > >> > Of
>> > > > >> > >> > > course,
>> > > > >> > >> > > > > > this
>> > > > >> > >> > > > > > > > > means
>> > > > >> > >> > > > > > > > > >>>>> we
>> > > > >> > >> > > > > > > > > >>>>>>>> won't completely cover the case
>> where
>> > we
>> > > > >> have a
>> > > > >> > >> > really
>> > > > >> > >> > > > > late
>> > > > >> > >> > > > > > > > > message
>> > > > >> > >> > > > > > > > > >>>>> and
>> > > > >> > >> > > > > > > > > >>>>>>> we
>> > > > >> > >> > > > > > > > > >>>>>>>> have added the partition to the new
>> > > > >> > transaction,
>> > > > >> > >> but
>> > > > >> > >> > > > > that's
>> > > > >> > >> > > > > > > > > >>>>>> unfortunately
>> > > > >> > >> > > > > > > > > >>>>>>>> something we will need the new
>> clients
>> > > to
>> > > > >> > cover.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> 30)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> Transaction is ongoing = partition
>> was
>> > > > >> added to
>> > > > >> > >> > > > > transaction
>> > > > >> > >> > > > > > > via
>> > > > >> > >> > > > > > > > > >>>>>>>> addPartitionsToTxn. We check this
>> with
>> > > the
>> > > > >> > >> > > > > > > DescribeTransactions
>> > > > >> > >> > > > > > > > > >>> call.
>> > > > >> > >> > > > > > > > > >>>>>> Let
>> > > > >> > >> > > > > > > > > >>>>>>>> me know if this wasn't sufficiently
>> > > > >> explained
>> > > > >> > >> here:
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > >
>> > > > >> > >> > > >
>> > > > >> > >> > >
>> > > > >> > >> >
>> > > > >> > >>
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense#KIP890:TransactionsServerSideDefense-EnsureOngoingTransactionforOlderClients(3)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> 40)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> The idea here is that if any
>> messages
>> > > > >> somehow
>> > > > >> > >> come
>> > > > >> > >> > in
>> > > > >> > >> > > > > before
>> > > > >> > >> > > > > > > we
>> > > > >> > >> > > > > > > > > get
>> > > > >> > >> > > > > > > > > >>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>> new
>> > > > >> > >> > > > > > > > > >>>>>>>> epoch to the producer, they will be
>> > > > fenced.
>> > > > >> > >> However,
>> > > > >> > >> > > if
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > > don't
>> > > > >> > >> > > > > > > > > >>>>> think
>> > > > >> > >> > > > > > > > > >>>>>>> this
>> > > > >> > >> > > > > > > > > >>>>>>>> is necessary, it can be discussed
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> 50)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> It should be synchronous because
>> if we
>> > > > have
>> > > > >> an
>> > > > >> > >> event
>> > > > >> > >> > > > (ie,
>> > > > >> > >> > > > > an
>> > > > >> > >> > > > > > > > > error)
>> > > > >> > >> > > > > > > > > >>>>>> that
>> > > > >> > >> > > > > > > > > >>>>>>>> causes us to need to abort the
>> > > > transaction,
>> > > > >> we
>> > > > >> > >> need
>> > > > >> > >> > to
>> > > > >> > >> > > > > know
>> > > > >> > >> > > > > > > > which
>> > > > >> > >> > > > > > > > > >>>>>>>> partitions to send transaction
>> markers
>> > > to.
>> > > > >> We
>> > > > >> > >> know
>> > > > >> > >> > the
>> > > > >> > >> > > > > > > > partitions
>> > > > >> > >> > > > > > > > > >>>>>> because
>> > > > >> > >> > > > > > > > > >>>>>>>> we added them to the coordinator
>> via
>> > the
>> > > > >> > >> > > > > addPartitionsToTxn
>> > > > >> > >> > > > > > > > call.
>> > > > >> > >> > > > > > > > > >>>>>>>> Previously we have had asynchronous
>> > > calls
>> > > > in
>> > > > >> > the
>> > > > >> > >> > past
>> > > > >> > >> > > > (ie,
>> > > > >> > >> > > > > > > > writing
>> > > > >> > >> > > > > > > > > >>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>>> commit markers when the
>> transaction is
>> > > > >> > completed)
>> > > > >> > >> > but
>> > > > >> > >> > > > > often
>> > > > >> > >> > > > > > > this
>> > > > >> > >> > > > > > > > > >>> just
>> > > > >> > >> > > > > > > > > >>>>>>>> causes confusion as we need to wait
>> > for
>> > > > some
>> > > > >> > >> > > operations
>> > > > >> > >> > > > to
>> > > > >> > >> > > > > > > > > complete.
>> > > > >> > >> > > > > > > > > >>>>> In
>> > > > >> > >> > > > > > > > > >>>>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>>> writing commit markers case,
>> clients
>> > > often
>> > > > >> see
>> > > > >> > >> > > > > > > > > >>>>> CONCURRENT_TRANSACTIONs
>> > > > >> > >> > > > > > > > > >>>>>>>> error messages and that can be
>> > > confusing.
>> > > > >> For
>> > > > >> > >> that
>> > > > >> > >> > > > reason,
>> > > > >> > >> > > > > > it
>> > > > >> > >> > > > > > > > may
>> > > > >> > >> > > > > > > > > be
>> > > > >> > >> > > > > > > > > >>>>>>>> simpler to just have synchronous
>> > calls —
>> > > > >> > >> especially
>> > > > >> > >> > if
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > need
>> > > > >> > >> > > > > > > > to
>> > > > >> > >> > > > > > > > > >>>>> block
>> > > > >> > >> > > > > > > > > >>>>>>> on
>> > > > >> > >> > > > > > > > > >>>>>>>> some operation's completion anyway
>> > > before
>> > > > we
>> > > > >> > can
>> > > > >> > >> > start
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > > next
>> > > > >> > >> > > > > > > > > >>>>>>>> transaction. And yes, I meant
>> > > > coordinator. I
>> > > > >> > will
>> > > > >> > >> > fix
>> > > > >> > >> > > > > that.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> 60)
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> When we are checking if the
>> > transaction
>> > > is
>> > > > >> > >> ongoing,
>> > > > >> > >> > we
>> > > > >> > >> > > > > need
>> > > > >> > >> > > > > > to
>> > > > >> > >> > > > > > > > > make
>> > > > >> > >> > > > > > > > > >>> a
>> > > > >> > >> > > > > > > > > >>>>>>> round
>> > > > >> > >> > > > > > > > > >>>>>>>> trip from the leader partition to
>> the
>> > > > >> > transaction
>> > > > >> > >> > > > > > coordinator.
>> > > > >> > >> > > > > > > > In
>> > > > >> > >> > > > > > > > > >>> the
>> > > > >> > >> > > > > > > > > >>>>>>> time
>> > > > >> > >> > > > > > > > > >>>>>>>> we are waiting for this message to
>> > come
>> > > > >> back,
>> > > > >> > in
>> > > > >> > >> > > theory
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > > could
>> > > > >> > >> > > > > > > > > >>> have
>> > > > >> > >> > > > > > > > > >>>>>>> sent
>> > > > >> > >> > > > > > > > > >>>>>>>> a commit/abort call that would make
>> > the
>> > > > >> > original
>> > > > >> > >> > > result
>> > > > >> > >> > > > of
>> > > > >> > >> > > > > > the
>> > > > >> > >> > > > > > > > > check
>> > > > >> > >> > > > > > > > > >>>>>> out
>> > > > >> > >> > > > > > > > > >>>>>>> of
>> > > > >> > >> > > > > > > > > >>>>>>>> date. That is why we can check the
>> > > leader
>> > > > >> state
>> > > > >> > >> > before
>> > > > >> > >> > > > we
>> > > > >> > >> > > > > > > write
>> > > > >> > >> > > > > > > > to
>> > > > >> > >> > > > > > > > > >>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>> log.
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> I'm happy to update the KIP if
>> some of
>> > > > these
>> > > > >> > >> things
>> > > > >> > >> > > were
>> > > > >> > >> > > > > not
>> > > > >> > >> > > > > > > > > clear.
>> > > > >> > >> > > > > > > > > >>>>>>>> Thanks,
>> > > > >> > >> > > > > > > > > >>>>>>>> Justine
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>> On Mon, Nov 21, 2022 at 7:11 PM
>> > Matthias
>> > > > J.
>> > > > >> > Sax <
>> > > > >> > >> > > > > > > > mj...@apache.org
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > > >>>>>>> wrote:
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Thanks for the KIP.
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Couple of clarification questions
>> (I
>> > am
>> > > > >> not a
>> > > > >> > >> > broker
>> > > > >> > >> > > > > expert
>> > > > >> > >> > > > > > > do
>> > > > >> > >> > > > > > > > > >>>>> maybe
>> > > > >> > >> > > > > > > > > >>>>>>>>> some question are obvious for
>> others,
>> > > but
>> > > > >> not
>> > > > >> > >> for
>> > > > >> > >> > me
>> > > > >> > >> > > > with
>> > > > >> > >> > > > > > my
>> > > > >> > >> > > > > > > > lack
>> > > > >> > >> > > > > > > > > >>>>> of
>> > > > >> > >> > > > > > > > > >>>>>>>>> broker knowledge).
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> (10)
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> The delayed message case can also
>> > > > violate
>> > > > >> EOS
>> > > > >> > >> if
>> > > > >> > >> > the
>> > > > >> > >> > > > > > delayed
>> > > > >> > >> > > > > > > > > >>>>>> message
>> > > > >> > >> > > > > > > > > >>>>>>>>> comes in after the next
>> > > > addPartitionsToTxn
>> > > > >> > >> request
>> > > > >> > >> > > > comes
>> > > > >> > >> > > > > > in.
>> > > > >> > >> > > > > > > > > >>>>>>> Effectively
>> > > > >> > >> > > > > > > > > >>>>>>>> we
>> > > > >> > >> > > > > > > > > >>>>>>>>> may see a message from a previous
>> > > > (aborted)
>> > > > >> > >> > > transaction
>> > > > >> > >> > > > > > > become
>> > > > >> > >> > > > > > > > > part
>> > > > >> > >> > > > > > > > > >>>>>> of
>> > > > >> > >> > > > > > > > > >>>>>>>> the
>> > > > >> > >> > > > > > > > > >>>>>>>>> next transaction.
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> What happens if the message come
>> in
>> > > > before
>> > > > >> the
>> > > > >> > >> next
>> > > > >> > >> > > > > > > > > >>>>>> addPartitionsToTxn
>> > > > >> > >> > > > > > > > > >>>>>>>>> request? It seems the broker
>> hosting
>> > > the
>> > > > >> data
>> > > > >> > >> > > > partitions
>> > > > >> > >> > > > > > > won't
>> > > > >> > >> > > > > > > > > know
>> > > > >> > >> > > > > > > > > >>>>>>>>> anything about it and append it to
>> > the
>> > > > >> > >> partition,
>> > > > >> > >> > > too?
>> > > > >> > >> > > > > What
>> > > > >> > >> > > > > > > is
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >>>>>>>>> difference between both cases?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Also, it seems a TX would only
>> hang,
>> > if
>> > > > >> there
>> > > > >> > >> is no
>> > > > >> > >> > > > > > following
>> > > > >> > >> > > > > > > > TX
>> > > > >> > >> > > > > > > > > >>>>> that
>> > > > >> > >> > > > > > > > > >>>>>>> is
>> > > > >> > >> > > > > > > > > >>>>>>>>> either committer or aborted? Thus,
>> > for
>> > > > the
>> > > > >> > case
>> > > > >> > >> > > above,
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > TX
>> > > > >> > >> > > > > > > > > might
>> > > > >> > >> > > > > > > > > >>>>>>>>> actually not hang (of course, we
>> > might
>> > > > get
>> > > > >> an
>> > > > >> > >> EOS
>> > > > >> > >> > > > > violation
>> > > > >> > >> > > > > > > if
>> > > > >> > >> > > > > > > > > the
>> > > > >> > >> > > > > > > > > >>>>>>> first
>> > > > >> > >> > > > > > > > > >>>>>>>>> TX was aborted and the second
>> > > committed,
>> > > > or
>> > > > >> > the
>> > > > >> > >> > other
>> > > > >> > >> > > > way
>> > > > >> > >> > > > > > > > > around).
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> (20)
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> Of course, 1 and 2 require
>> > client-side
>> > > > >> > >> changes, so
>> > > > >> > >> > > for
>> > > > >> > >> > > > > > older
>> > > > >> > >> > > > > > > > > >>>>>> clients,
>> > > > >> > >> > > > > > > > > >>>>>>>>> those approaches won’t apply.
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> For (1) I understand why a client
>> > > change
>> > > > is
>> > > > >> > >> > > necessary,
>> > > > >> > >> > > > > but
>> > > > >> > >> > > > > > > not
>> > > > >> > >> > > > > > > > > sure
>> > > > >> > >> > > > > > > > > >>>>>> why
>> > > > >> > >> > > > > > > > > >>>>>>>>> we need a client change for (2).
>> Can
>> > > you
>> > > > >> > >> elaborate?
>> > > > >> > >> > > --
>> > > > >> > >> > > > > > Later
>> > > > >> > >> > > > > > > > you
>> > > > >> > >> > > > > > > > > >>>>>>> explain
>> > > > >> > >> > > > > > > > > >>>>>>>>> that we should send a
>> > > > >> > >> DescribeTransactionRequest,
>> > > > >> > >> > > but I
>> > > > >> > >> > > > > am
>> > > > >> > >> > > > > > > not
>> > > > >> > >> > > > > > > > > sure
>> > > > >> > >> > > > > > > > > >>>>>>> why?
>> > > > >> > >> > > > > > > > > >>>>>>>>> Can't we not just do an implicit
>> > > > >> > >> AddPartiitonToTx,
>> > > > >> > >> > > too?
>> > > > >> > >> > > > > If
>> > > > >> > >> > > > > > > the
>> > > > >> > >> > > > > > > > > old
>> > > > >> > >> > > > > > > > > >>>>>>>>> producer correctly registered the
>> > > > partition
>> > > > >> > >> > already,
>> > > > >> > >> > > > the
>> > > > >> > >> > > > > > > > > >>>>>> TX-coordinator
>> > > > >> > >> > > > > > > > > >>>>>>>>> can just ignore it as it's an
>> > > idempotent
>> > > > >> > >> operation?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> (30)
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> To cover older clients, we will
>> > > ensure a
>> > > > >> > >> > transaction
>> > > > >> > >> > > > is
>> > > > >> > >> > > > > > > > ongoing
>> > > > >> > >> > > > > > > > > >>>>>>> before
>> > > > >> > >> > > > > > > > > >>>>>>>>> we write to a transaction
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Not sure what you mean by this?
>> Can
>> > you
>> > > > >> > >> elaborate?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> (40)
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> [the TX-coordinator] will write
>> the
>> > > > >> prepare
>> > > > >> > >> commit
>> > > > >> > >> > > > > message
>> > > > >> > >> > > > > > > > with
>> > > > >> > >> > > > > > > > > a
>> > > > >> > >> > > > > > > > > >>>>>>>> bumped
>> > > > >> > >> > > > > > > > > >>>>>>>>> epoch and send
>> WriteTxnMarkerRequests
>> > > > with
>> > > > >> the
>> > > > >> > >> > bumped
>> > > > >> > >> > > > > > epoch.
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Why do we use the bumped epoch for
>> > > both?
>> > > > It
>> > > > >> > >> seems
>> > > > >> > >> > > more
>> > > > >> > >> > > > > > > > intuitive
>> > > > >> > >> > > > > > > > > to
>> > > > >> > >> > > > > > > > > >>>>>> use
>> > > > >> > >> > > > > > > > > >>>>>>>>> the current epoch, and only return
>> > the
>> > > > >> bumped
>> > > > >> > >> epoch
>> > > > >> > >> > > to
>> > > > >> > >> > > > > the
>> > > > >> > >> > > > > > > > > >>>>> producer?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> (50) "Implicit
>> > > AddPartitionToTransaction"
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Why does the implicitly sent
>> request
>> > > need
>> > > > >> to
>> > > > >> > be
>> > > > >> > >> > > > > > synchronous?
>> > > > >> > >> > > > > > > > The
>> > > > >> > >> > > > > > > > > >>>>> KIP
>> > > > >> > >> > > > > > > > > >>>>>>>>> also says
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> in case we need to abort and
>> need to
>> > > > know
>> > > > >> > which
>> > > > >> > >> > > > > partitions
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> What do you mean by this?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> we don’t want to write to it
>> before
>> > we
>> > > > >> store
>> > > > >> > in
>> > > > >> > >> > the
>> > > > >> > >> > > > > > > > transaction
>> > > > >> > >> > > > > > > > > >>>>>>> manager
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> Do you mean TX-coordinator
>> instead of
>> > > > >> > "manager"?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> (60)
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> For older clients and ensuring
>> that
>> > the
>> > > > TX
>> > > > >> is
>> > > > >> > >> > > ongoing,
>> > > > >> > >> > > > > you
>> > > > >> > >> > > > > > > > > >>>>> describe a
>> > > > >> > >> > > > > > > > > >>>>>>>>> race condition. I am not sure if I
>> > can
>> > > > >> follow
>> > > > >> > >> here.
>> > > > >> > >> > > Can
>> > > > >> > >> > > > > you
>> > > > >> > >> > > > > > > > > >>>>>> elaborate?
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> -Matthias
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>> On 11/18/22 1:21 PM, Justine
>> Olshan
>> > > > wrote:
>> > > > >> > >> > > > > > > > > >>>>>>>>>> Hey all!
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> I'd like to start a discussion
>> on my
>> > > > >> proposal
>> > > > >> > >> to
>> > > > >> > >> > add
>> > > > >> > >> > > > > some
>> > > > >> > >> > > > > > > > > >>>>>> server-side
>> > > > >> > >> > > > > > > > > >>>>>>>>>> checks on transactions to avoid
>> > > hanging
>> > > > >> > >> > > transactions.
>> > > > >> > >> > > > I
>> > > > >> > >> > > > > > know
>> > > > >> > >> > > > > > > > > this
>> > > > >> > >> > > > > > > > > >>>>>> has
>> > > > >> > >> > > > > > > > > >>>>>>>>> been
>> > > > >> > >> > > > > > > > > >>>>>>>>>> an issue for some time, so I
>> really
>> > > hope
>> > > > >> this
>> > > > >> > >> KIP
>> > > > >> > >> > > will
>> > > > >> > >> > > > > be
>> > > > >> > >> > > > > > > > > helpful
>> > > > >> > >> > > > > > > > > >>>>>> for
>> > > > >> > >> > > > > > > > > >>>>>>>>> many
>> > > > >> > >> > > > > > > > > >>>>>>>>>> users of EOS.
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> The KIP includes changes that
>> will
>> > be
>> > > > >> > >> compatible
>> > > > >> > >> > > with
>> > > > >> > >> > > > > old
>> > > > >> > >> > > > > > > > > clients
>> > > > >> > >> > > > > > > > > >>>>>> and
>> > > > >> > >> > > > > > > > > >>>>>>>>>> changes to improve performance
>> and
>> > > > >> > correctness
>> > > > >> > >> on
>> > > > >> > >> > > new
>> > > > >> > >> > > > > > > clients.
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> Please take a look and leave any
>> > > > comments
>> > > > >> you
>> > > > >> > >> may
>> > > > >> > >> > > > have!
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> KIP:
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > >
>> > > > >> > >> > > >
>> > > > >> > >> > >
>> > > > >> > >> >
>> > > > >> > >>
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense
>> > > > >> > >> > > > > > > > > >>>>>>>>>> JIRA:
>> > > > >> > >> > > > https://issues.apache.org/jira/browse/KAFKA-14402
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>> Thanks!
>> > > > >> > >> > > > > > > > > >>>>>>>>>> Justine
>> > > > >> > >> > > > > > > > > >>>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>>
>> > > > >> > >> > > > > > > > > >>>>>>
>> > > > >> > >> > > > > > > > > >>>>>
>> > > > >> > >> > > > > > > > > >>>>
>> > > > >> > >> > > > > > > > > >>>
>> > > > >> > >> > > > > > > > > >>
>> > > > >> > >> > > > > > > > > >
>> > > > >> > >> > > > > > > > >
>> > > > >> > >> > > > > > > >
>> > > > >> > >> > > > > > >
>> > > > >> > >> > > > > >
>> > > > >> > >> > > > >
>> > > > >> > >> > > >
>> > > > >> > >> > >
>> > > > >> > >> >
>> > > > >> > >>
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-890 Server Side Defense

Reply via email to