Hi, Justine,

Which PRC/record protocols will TV guard? Going forward, will those
PRC/record protocols only be guarded by TV and not by other features like
MV?

Thanks,

Jun

On Mon, Feb 5, 2024 at 2:41 PM Justine Olshan <jols...@confluent.io.invalid>
wrote:

> Hi Jun,
>
> Sorry I think I misunderstood your question or answered incorrectly. The TV
> version should ideally be fully independent from MV.
> At least for the changes I proposed, TV should not affect MV and MV should
> not affect TV/
>
> I think if we downgrade TV, only that feature should downgrade. Likewise
> the same with MV. The finalizedFeatures should just reflect the feature
> downgrade we made.
>
> I also plan to write a new KIP for managing the disk format and upgrade
> tool as we will need new flags to support these features. That should help
> clarify some things.
>
> Justine
>
> On Mon, Feb 5, 2024 at 11:03 AM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, Justine,
> >
> > Thanks for the reply.
> >
> > So, if we downgrade TV, we could implicitly downgrade another feature
> (say
> > MV) that has dependency (e.g. RPC). What would we return for
> > FinalizedFeatures for MV in ApiVersionsResponse in that case?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Feb 2, 2024 at 1:06 PM Justine Olshan
> <jols...@confluent.io.invalid
> > >
> > wrote:
> >
> > > Hey Jun,
> > >
> > > Yes, the idea is that if we downgrade TV (transaction version) we will
> > stop
> > > using the add partitions to txn optimization and stop writing the
> > flexible
> > > feature version of the log.
> > > In the compatibility section I included some explanations on how this
> is
> > > done.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Feb 2, 2024 at 11:12 AM Jun Rao <j...@confluent.io.invalid>
> > wrote:
> > >
> > > > Hi, Justine,
> > > >
> > > > Thanks for the update.
> > > >
> > > > If we ever downgrade the transaction feature, any feature depending
> on
> > > > changes on top of those RPC/record
> > > > (AddPartitionsToTxnRequest/TransactionLogValue) changes made in
> KIP-890
> > > > will be automatically downgraded too?
> > > >
> > > > Jun
> > > >
> > > > On Tue, Jan 30, 2024 at 3:32 PM Justine Olshan
> > > > <jols...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > > > Hey Jun,
> > > > >
> > > > > I wanted to get back to you about your questions about MV/IBP.
> > > > >
> > > > > Looking at the options, I think it makes the most sense to create a
> > > > > separate feature for transactions and use that to version gate the
> > > > features
> > > > > we need to version gate (flexible transactional state records and
> > using
> > > > the
> > > > > new protocol)
> > > > > I've updated the KIP to include this change. Hopefully that's
> > > everything
> > > > we
> > > > > need for this KIP :)
> > > > >
> > > > > Justine
> > > > >
> > > > >
> > > > > On Mon, Jan 22, 2024 at 3:17 PM Justine Olshan <
> jols...@confluent.io
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Jun,
> > > > > >
> > > > > > I will update the KIP with the prev field for prepare as well.
> > > > > >
> > > > > > PREPARE
> > > > > > producerId: x
> > > > > > previous/lastProducerId (tagged field): x
> > > > > > nextProducerId (tagged field): empty or z if y will overflow
> > > > > > producerEpoch: y + 1
> > > > > >
> > > > > > COMPLETE
> > > > > > producerId: x or z if y overflowed
> > > > > > previous/lastProducerId (tagged field): x
> > > > > > nextProducerId (tagged field): empty
> > > > > > producerEpoch: y + 1 or 0 if we overflowed
> > > > > >
> > > > > > Thanks again,
> > > > > > Justine
> > > > > >
> > > > > > On Mon, Jan 22, 2024 at 3:15 PM Jun Rao <j...@confluent.io.invalid
> >
> > > > > wrote:
> > > > > >
> > > > > >> Hi, Justine,
> > > > > >>
> > > > > >> 101.3 Thanks for the explanation.
> > > > > >> (1) My point was that the coordinator could fail right after
> > writing
> > > > the
> > > > > >> prepare marker. When the new txn coordinator generates the
> > complete
> > > > > marker
> > > > > >> after the failover, it needs some field from the prepare marker
> to
> > > > > >> determine whether it's written by the new client.
> > > > > >>
> > > > > >> (2) The changing of the behavior sounds good to me. We only want
> > to
> > > > > return
> > > > > >> success if the prepare state is written by the new client. So,
> in
> > > the
> > > > > >> non-overflow case, it seems that we also need sth in the prepare
> > > > marker
> > > > > to
> > > > > >> tell us whether it's written by the new client.
> > > > > >>
> > > > > >> 112. Thanks for the explanation. That sounds good to me.
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Mon, Jan 22, 2024 at 11:32 AM Justine Olshan
> > > > > >> <jols...@confluent.io.invalid> wrote:
> > > > > >>
> > > > > >> > 101.3 I realized that I actually have two questions.
> > > > > >> > > (1) In the non-overflow case, we need to write the previous
> > > > produce
> > > > > Id
> > > > > >> > tagged field in the end maker so that we know if the marker is
> > > from
> > > > > the
> > > > > >> new
> > > > > >> > client. Since the end maker is derived from the prepare
> marker,
> > > > should
> > > > > >> we
> > > > > >> > write the previous produce Id in the prepare marker field too?
> > > > > >> Otherwise,
> > > > > >> > we will lose this information when deriving the end marker.
> > > > > >> >
> > > > > >> > The "previous" producer ID is in the normal producer ID field.
> > So
> > > > yes,
> > > > > >> we
> > > > > >> > need it in prepare and that was always the plan.
> > > > > >> >
> > > > > >> > Maybe it is a bit unclear so I will enumerate the fields and
> add
> > > > them
> > > > > to
> > > > > >> > the KIP if that helps.
> > > > > >> > Say we have producer ID x and epoch y. When we overflow epoch
> y
> > we
> > > > get
> > > > > >> > producer ID Z.
> > > > > >> >
> > > > > >> > PREPARE
> > > > > >> > producerId: x
> > > > > >> > previous/lastProducerId (tagged field): empty
> > > > > >> > nextProducerId (tagged field): empty or z if y will overflow
> > > > > >> > producerEpoch: y + 1
> > > > > >> >
> > > > > >> > COMPLETE
> > > > > >> > producerId: x or z if y overflowed
> > > > > >> > previous/lastProducerId (tagged field): x
> > > > > >> > nextProducerId (tagged field): empty
> > > > > >> > producerEpoch: y + 1 or 0 if we overflowed
> > > > > >> >
> > > > > >> > (2) In the prepare phase, if we retry and see epoch - 1 + ID
> in
> > > last
> > > > > >> seen
> > > > > >> > fields and are issuing the same command (ie commit not abort),
> > we
> > > > > return
> > > > > >> > success. The logic before KIP-890 seems to return
> > > > > >> CONCURRENT_TRANSACTIONS
> > > > > >> > in this case. Are we intentionally making this change?
> > > > > >> >
> > > > > >> > Hmm -- we would fence the producer if the epoch is bumped and
> we
> > > > get a
> > > > > >> > lower epoch. Yes -- we are intentionally adding this to
> prevent
> > > > > fencing.
> > > > > >> >
> > > > > >> >
> > > > > >> > 112. We already merged the code that adds the VerifyOnly field
> > in
> > > > > >> > AddPartitionsToTxnRequest, which is an inter broker request.
> It
> > > > seems
> > > > > >> that
> > > > > >> > we didn't bump up the IBP for that. Do you know why?
> > > > > >> >
> > > > > >> > We no longer need IBP for all interbroker requests as
> > ApiVersions
> > > > > should
> > > > > >> > correctly gate versioning.
> > > > > >> > We also handle unsupported version errors correctly if we
> > receive
> > > > them
> > > > > >> in
> > > > > >> > edge cases like upgrades/downgrades.
> > > > > >> >
> > > > > >> > Justine
> > > > > >> >
> > > > > >> > On Mon, Jan 22, 2024 at 11:00 AM Jun Rao
> > <j...@confluent.io.invalid
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Hi, Justine,
> > > > > >> > >
> > > > > >> > > Thanks for the reply.
> > > > > >> > >
> > > > > >> > > 101.3 I realized that I actually have two questions.
> > > > > >> > > (1) In the non-overflow case, we need to write the previous
> > > > produce
> > > > > Id
> > > > > >> > > tagged field in the end maker so that we know if the marker
> is
> > > > from
> > > > > >> the
> > > > > >> > new
> > > > > >> > > client. Since the end maker is derived from the prepare
> > marker,
> > > > > >> should we
> > > > > >> > > write the previous produce Id in the prepare marker field
> too?
> > > > > >> Otherwise,
> > > > > >> > > we will lose this information when deriving the end marker.
> > > > > >> > > (2) In the prepare phase, if we retry and see epoch - 1 + ID
> > in
> > > > last
> > > > > >> seen
> > > > > >> > > fields and are issuing the same command (ie commit not
> abort),
> > > we
> > > > > >> return
> > > > > >> > > success. The logic before KIP-890 seems to return
> > > > > >> CONCURRENT_TRANSACTIONS
> > > > > >> > > in this case. Are we intentionally making this change?
> > > > > >> > >
> > > > > >> > > 112. We already merged the code that adds the VerifyOnly
> field
> > > in
> > > > > >> > > AddPartitionsToTxnRequest, which is an inter broker request.
> > It
> > > > > seems
> > > > > >> > that
> > > > > >> > > we didn't bump up the IBP for that. Do you know why?
> > > > > >> > >
> > > > > >> > > Jun
> > > > > >> > >
> > > > > >> > > On Fri, Jan 19, 2024 at 4:50 PM Justine Olshan
> > > > > >> > > <jols...@confluent.io.invalid>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Jun,
> > > > > >> > > >
> > > > > >> > > > 101.3 I can change "last seen" to "current producer id and
> > > > epoch"
> > > > > if
> > > > > >> > that
> > > > > >> > > > was the part that was confusing
> > > > > >> > > > 110 I can mention this
> > > > > >> > > > 111 I can do that
> > > > > >> > > > 112 We still need it. But I am still finalizing the
> design.
> > I
> > > > will
> > > > > >> > update
> > > > > >> > > > the KIP once I get the information finalized. Sorry for
> the
> > > > > delays.
> > > > > >> > > >
> > > > > >> > > > Justine
> > > > > >> > > >
> > > > > >> > > > On Fri, Jan 19, 2024 at 10:50 AM Jun Rao
> > > > <j...@confluent.io.invalid
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi, Justine,
> > > > > >> > > > >
> > > > > >> > > > > Thanks for the reply.
> > > > > >> > > > >
> > > > > >> > > > > 101.3 In the non-overflow case, the previous ID is the
> > same
> > > as
> > > > > the
> > > > > >> > > > produce
> > > > > >> > > > > ID for the complete marker too, but we set the previous
> ID
> > > in
> > > > > the
> > > > > >> > > > complete
> > > > > >> > > > > marker. Earlier you mentioned that this is to know that
> > the
> > > > > >> marker is
> > > > > >> > > > > written by the new client so that we could return
> success
> > on
> > > > > >> retried
> > > > > >> > > > > endMarker requests. I was trying to understand why this
> is
> > > not
> > > > > >> needed
> > > > > >> > > for
> > > > > >> > > > > the prepare marker since retry can happen in the prepare
> > > state
> > > > > >> too.
> > > > > >> > Is
> > > > > >> > > > the
> > > > > >> > > > > reason that in the prepare state, we return
> > > > > >> CONCURRENT_TRANSACTIONS
> > > > > >> > > > instead
> > > > > >> > > > > of success on retried endMaker requests? If so, should
> we
> > > > change
> > > > > >> "If
> > > > > >> > we
> > > > > >> > > > > retry and see epoch - 1 + ID in last seen fields and are
> > > > issuing
> > > > > >> the
> > > > > >> > > same
> > > > > >> > > > > command (ie commit not abort) we can return (with the
> new
> > > > > epoch)"
> > > > > >> > > > > accordingly?
> > > > > >> > > > >
> > > > > >> > > > > 110. Yes, without this KIP, a delayed endMaker request
> > > carries
> > > > > the
> > > > > >> > same
> > > > > >> > > > > epoch and won't be fenced. This can commit/abort a
> future
> > > > > >> transaction
> > > > > >> > > > > unexpectedly. I am not sure if we have seen this in
> > practice
> > > > > >> though.
> > > > > >> > > > >
> > > > > >> > > > > 111. Sounds good. It would be useful to make it clear
> that
> > > we
> > > > > can
> > > > > >> now
> > > > > >> > > > > populate the lastSeen field from the log reliably.
> > > > > >> > > > >
> > > > > >> > > > > 112. Yes, I was referring to AddPartitionsToTxnRequest
> > since
> > > > > it's
> > > > > >> > > called
> > > > > >> > > > > across brokers and we are changing its schema. Are you
> > > saying
> > > > we
> > > > > >> > don't
> > > > > >> > > > need
> > > > > >> > > > > it any more? I thought that we already implemented the
> > > server
> > > > > side
> > > > > >> > > > > verification logic based on AddPartitionsToTxnRequest
> > across
> > > > > >> brokers.
> > > > > >> > > > >
> > > > > >> > > > > Jun
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Thu, Jan 18, 2024 at 5:05 PM Justine Olshan
> > > > > >> > > > > <jols...@confluent.io.invalid>
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Hey Jun,
> > > > > >> > > > > >
> > > > > >> > > > > > 101.3 We don't set the previous ID in the Prepare
> field
> > > > since
> > > > > we
> > > > > >> > > don't
> > > > > >> > > > > need
> > > > > >> > > > > > it. It is the same producer ID as the main producer ID
> > > > field.
> > > > > >> > > > > >
> > > > > >> > > > > > 110 Hmm -- maybe I need to reread your message about
> > > delayed
> > > > > >> > markers.
> > > > > >> > > > If
> > > > > >> > > > > we
> > > > > >> > > > > > receive a delayed endTxn marker after the transaction
> is
> > > > > already
> > > > > >> > > > > complete?
> > > > > >> > > > > > So we will commit the next transaction early without
> the
> > > > fixes
> > > > > >> in
> > > > > >> > > part
> > > > > >> > > > 2?
> > > > > >> > > > > >
> > > > > >> > > > > > 111 Yes -- this terminology was used in a previous KIP
> > and
> > > > > never
> > > > > >> > > > > > implemented it in the log -- only in memory
> > > > > >> > > > > >
> > > > > >> > > > > > 112 Hmm -- which interbroker protocol are you
> referring
> > > to?
> > > > I
> > > > > am
> > > > > >> > > > working
> > > > > >> > > > > on
> > > > > >> > > > > > the design for the work to remove the extra add
> > partitions
> > > > > call
> > > > > >> > and I
> > > > > >> > > > > right
> > > > > >> > > > > > now the design bumps MV. I have yet to update that
> > section
> > > > as
> > > > > I
> > > > > >> > > > finalize
> > > > > >> > > > > > the design so please stay tuned. Was there anything
> else
> > > you
> > > > > >> > thought
> > > > > >> > > > > needed
> > > > > >> > > > > > MV bump?
> > > > > >> > > > > >
> > > > > >> > > > > > Justine
> > > > > >> > > > > >
> > > > > >> > > > > > On Thu, Jan 18, 2024 at 3:07 PM Jun Rao
> > > > > >> <j...@confluent.io.invalid>
> > > > > >> > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > Hi, Justine,
> > > > > >> > > > > > >
> > > > > >> > > > > > > I don't see this create any issue. It just makes it
> a
> > > bit
> > > > > >> hard to
> > > > > >> > > > > explain
> > > > > >> > > > > > > what this non-tagged produce id field means. We are
> > > > > >> essentially
> > > > > >> > > > trying
> > > > > >> > > > > to
> > > > > >> > > > > > > combine two actions (completing a txn and init a new
> > > > produce
> > > > > >> Id)
> > > > > >> > > in a
> > > > > >> > > > > > > single record. But, this may be fine too.
> > > > > >> > > > > > >
> > > > > >> > > > > > > A few other follow up comments.
> > > > > >> > > > > > >
> > > > > >> > > > > > > 101.3 I guess the reason that we only set the
> previous
> > > > > >> produce id
> > > > > >> > > > > tagged
> > > > > >> > > > > > > field in the complete marker, but not in the prepare
> > > > marker,
> > > > > >> is
> > > > > >> > > that
> > > > > >> > > > in
> > > > > >> > > > > > the
> > > > > >> > > > > > > prepare state, we always return
> > CONCURRENT_TRANSACTIONS
> > > on
> > > > > >> > retried
> > > > > >> > > > > > endMaker
> > > > > >> > > > > > > requests?
> > > > > >> > > > > > >
> > > > > >> > > > > > > 110. "I believe your second point is mentioned in
> the
> > > > KIP. I
> > > > > >> can
> > > > > >> > > add
> > > > > >> > > > > more
> > > > > >> > > > > > > text on
> > > > > >> > > > > > > this if it is helpful.
> > > > > >> > > > > > > > The delayed message case can also violate EOS if
> the
> > > > > delayed
> > > > > >> > > > message
> > > > > >> > > > > > > comes in after the next addPartitionsToTxn request
> > comes
> > > > in.
> > > > > >> > > > > Effectively
> > > > > >> > > > > > we
> > > > > >> > > > > > > may see a message from a previous (aborted)
> > transaction
> > > > > become
> > > > > >> > part
> > > > > >> > > > of
> > > > > >> > > > > > the
> > > > > >> > > > > > > next transaction."
> > > > > >> > > > > > >
> > > > > >> > > > > > > The above is the case when a delayed message is
> > appended
> > > > to
> > > > > >> the
> > > > > >> > > data
> > > > > >> > > > > > > partition. What I mentioned is a slightly different
> > case
> > > > > when
> > > > > >> a
> > > > > >> > > > delayed
> > > > > >> > > > > > > marker is appended to the transaction log partition.
> > > > > >> > > > > > >
> > > > > >> > > > > > > 111. The KIP says "Once we move past the Prepare and
> > > > > Complete
> > > > > >> > > states,
> > > > > >> > > > > we
> > > > > >> > > > > > > don’t need to worry about lastSeen fields and clear
> > > them,
> > > > > just
> > > > > >> > > handle
> > > > > >> > > > > > state
> > > > > >> > > > > > > transitions as normal.". Is the lastSeen field the
> > same
> > > as
> > > > > the
> > > > > >> > > > previous
> > > > > >> > > > > > > Produce Id tagged field in TransactionLogValue?
> > > > > >> > > > > > >
> > > > > >> > > > > > > 112. Since the kip changes the inter-broker
> protocol,
> > > > should
> > > > > >> we
> > > > > >> > > bump
> > > > > >> > > > up
> > > > > >> > > > > > the
> > > > > >> > > > > > > MV/IBP version? Is this feature only for the KRaft
> > mode?
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Jun
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Wed, Jan 17, 2024 at 11:13 AM Justine Olshan
> > > > > >> > > > > > > <jols...@confluent.io.invalid> wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Hey Jun,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > I'm glad we are getting to convergence on the
> > design.
> > > :)
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > While I understand it seems a little "weird". I'm
> > not
> > > > sure
> > > > > >> what
> > > > > >> > > the
> > > > > >> > > > > > > benefit
> > > > > >> > > > > > > > of writing an extra record to the log.
> > > > > >> > > > > > > > Is the concern a tool to describe transactions
> won't
> > > > work
> > > > > >> (ie,
> > > > > >> > > the
> > > > > >> > > > > > > complete
> > > > > >> > > > > > > > state is needed to calculate the time since the
> > > > > transaction
> > > > > >> > > > > completed?)
> > > > > >> > > > > > > > If we have a reason like this, it is enough to
> > > convince
> > > > me
> > > > > >> we
> > > > > >> > > need
> > > > > >> > > > > such
> > > > > >> > > > > > > an
> > > > > >> > > > > > > > extra record. It seems like it would be replacing
> > the
> > > > > record
> > > > > >> > > > written
> > > > > >> > > > > on
> > > > > >> > > > > > > > InitProducerId. Is this correct?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Thanks,
> > > > > >> > > > > > > > Justine
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Tue, Jan 16, 2024 at 5:14 PM Jun Rao
> > > > > >> > <j...@confluent.io.invalid
> > > > > >> > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Hi, Justine,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks for the explanation. I understand the
> > > intention
> > > > > >> now.
> > > > > >> > In
> > > > > >> > > > the
> > > > > >> > > > > > > > overflow
> > > > > >> > > > > > > > > case, we set the non-tagged field to the old pid
> > > (and
> > > > > the
> > > > > >> max
> > > > > >> > > > > epoch)
> > > > > >> > > > > > in
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > prepare marker so that we could correctly write
> > the
> > > > > >> marker to
> > > > > >> > > the
> > > > > >> > > > > > data
> > > > > >> > > > > > > > > partition if the broker downgrades. When writing
> > the
> > > > > >> complete
> > > > > >> > > > > marker,
> > > > > >> > > > > > > we
> > > > > >> > > > > > > > > know the marker has already been written to the
> > data
> > > > > >> > partition.
> > > > > >> > > > We
> > > > > >> > > > > > set
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > non-tagged field to the new pid to avoid
> > > > > >> > > > InvalidPidMappingException
> > > > > >> > > > > > in
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > client if the broker downgrades.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > The above seems to work. It's just a bit
> > > inconsistent
> > > > > for
> > > > > >> a
> > > > > >> > > > prepare
> > > > > >> > > > > > > > marker
> > > > > >> > > > > > > > > and a complete marker to use different pids in
> > this
> > > > > >> special
> > > > > >> > > case.
> > > > > >> > > > > If
> > > > > >> > > > > > we
> > > > > >> > > > > > > > > downgrade with the complete marker, it seems
> that
> > we
> > > > > will
> > > > > >> > never
> > > > > >> > > > be
> > > > > >> > > > > > able
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > write the complete marker with the old pid. Not
> > sure
> > > > if
> > > > > it
> > > > > >> > > causes
> > > > > >> > > > > any
> > > > > >> > > > > > > > > issue, but it seems a bit weird. Instead of
> > writing
> > > > the
> > > > > >> > > complete
> > > > > >> > > > > > marker
> > > > > >> > > > > > > > > with the new pid, could we write two records: a
> > > > complete
> > > > > >> > marker
> > > > > >> > > > > with
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > old pid followed by a TransactionLogValue with
> the
> > > new
> > > > > pid
> > > > > >> > and
> > > > > >> > > an
> > > > > >> > > > > > empty
> > > > > >> > > > > > > > > state? We could make the two records in the same
> > > batch
> > > > > so
> > > > > >> > that
> > > > > >> > > > they
> > > > > >> > > > > > > will
> > > > > >> > > > > > > > be
> > > > > >> > > > > > > > > added to the log atomically.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Thanks,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Jun
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Fri, Jan 12, 2024 at 5:40 PM Justine Olshan
> > > > > >> > > > > > > > > <jols...@confluent.io.invalid>
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > (1) the prepare marker is written, but the
> > endTxn
> > > > > >> response
> > > > > >> > is
> > > > > >> > > > not
> > > > > >> > > > > > > > > received
> > > > > >> > > > > > > > > > by the client when the server downgrades
> > > > > >> > > > > > > > > > (2)  the prepare marker is written, the endTxn
> > > > > response
> > > > > >> is
> > > > > >> > > > > received
> > > > > >> > > > > > > by
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > client when the server downgrades.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I think I am still a little confused. In both
> of
> > > > these
> > > > > >> > cases,
> > > > > >> > > > the
> > > > > >> > > > > > > > > > transaction log has the old producer ID. We
> > don't
> > > > > write
> > > > > >> the
> > > > > >> > > new
> > > > > >> > > > > > > > producer
> > > > > >> > > > > > > > > ID
> > > > > >> > > > > > > > > > in the prepare marker's non tagged fields.
> > > > > >> > > > > > > > > > If the server downgrades now, it would read
> the
> > > > > records
> > > > > >> not
> > > > > >> > > in
> > > > > >> > > > > > tagged
> > > > > >> > > > > > > > > > fields and the complete marker will also have
> > the
> > > > old
> > > > > >> > > producer
> > > > > >> > > > > ID.
> > > > > >> > > > > > > > > > (If we had used the new producer ID, we would
> > not
> > > > have
> > > > > >> > > > > > transactional
> > > > > >> > > > > > > > > > correctness since the producer id doesn't
> match
> > > the
> > > > > >> > > transaction
> > > > > >> > > > > and
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > state would not be correct on the data
> > partition.)
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > In the overflow case, I'd expect the following
> > to
> > > > > >> happen on
> > > > > >> > > the
> > > > > >> > > > > > > client
> > > > > >> > > > > > > > > side
> > > > > >> > > > > > > > > > Case 1  -- we retry EndTxn -- it is the same
> > > > producer
> > > > > ID
> > > > > >> > and
> > > > > >> > > > > epoch
> > > > > >> > > > > > -
> > > > > >> > > > > > > 1
> > > > > >> > > > > > > > > this
> > > > > >> > > > > > > > > > would fence the producer
> > > > > >> > > > > > > > > > Case 2 -- we don't retry EndTxn and use the
> new
> > > > > >> producer id
> > > > > >> > > > which
> > > > > >> > > > > > > would
> > > > > >> > > > > > > > > > result in InvalidPidMappingException
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Maybe we can have special handling for when a
> > > server
> > > > > >> > > > downgrades.
> > > > > >> > > > > > When
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > > > reconnects we could get an API version request
> > > > showing
> > > > > >> > > KIP-890
> > > > > >> > > > > > part 2
> > > > > >> > > > > > > > is
> > > > > >> > > > > > > > > > not supported. In that case, we can call
> > > > > initProducerId
> > > > > >> to
> > > > > >> > > > abort
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > transaction. (In the overflow case, this
> > correctly
> > > > > gives
> > > > > >> > us a
> > > > > >> > > > new
> > > > > >> > > > > > > > > producer
> > > > > >> > > > > > > > > > ID)
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I guess the corresponding case would be where
> > the
> > > > > >> *complete
> > > > > >> > > > > marker
> > > > > >> > > > > > > *is
> > > > > >> > > > > > > > > > written but the endTxn is not received by the
> > > client
> > > > > and
> > > > > >> > the
> > > > > >> > > > > server
> > > > > >> > > > > > > > > > downgrades? This would result in the
> transaction
> > > > > >> > coordinator
> > > > > >> > > > > having
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > new
> > > > > >> > > > > > > > > > ID and not the old one.  If the client
> retries,
> > it
> > > > > will
> > > > > >> > > receive
> > > > > >> > > > > an
> > > > > >> > > > > > > > > > InvalidPidMappingException. The InitProducerId
> > > > > scenario
> > > > > >> > above
> > > > > >> > > > > would
> > > > > >> > > > > > > > help
> > > > > >> > > > > > > > > > here too.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > To be clear, my compatibility story is meant
> to
> > > > > support
> > > > > >> > > > > downgrades
> > > > > >> > > > > > > > server
> > > > > >> > > > > > > > > > side in keeping the transactional correctness.
> > > > Keeping
> > > > > >> the
> > > > > >> > > > client
> > > > > >> > > > > > > from
> > > > > >> > > > > > > > > > fencing itself is not the priority.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Hope this helps. I can also add text in the
> KIP
> > > > about
> > > > > >> > > > > > InitProducerId
> > > > > >> > > > > > > if
> > > > > >> > > > > > > > > we
> > > > > >> > > > > > > > > > think that fixes some edge cases.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Justine
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Fri, Jan 12, 2024 at 4:10 PM Jun Rao
> > > > > >> > > > <j...@confluent.io.invalid
> > > > > >> > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > Hi, Justine,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Thanks for the reply.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > I agree that we don't need to optimize for
> > > fencing
> > > > > >> during
> > > > > >> > > > > > > downgrades.
> > > > > >> > > > > > > > > > > Regarding consistency, there are two
> possible
> > > > cases:
> > > > > >> (1)
> > > > > >> > > the
> > > > > >> > > > > > > prepare
> > > > > >> > > > > > > > > > marker
> > > > > >> > > > > > > > > > > is written, but the endTxn response is not
> > > > received
> > > > > by
> > > > > >> > the
> > > > > >> > > > > client
> > > > > >> > > > > > > > when
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > server downgrades; (2)  the prepare marker
> is
> > > > > written,
> > > > > >> > the
> > > > > >> > > > > endTxn
> > > > > >> > > > > > > > > > response
> > > > > >> > > > > > > > > > > is received by the client when the server
> > > > > downgrades.
> > > > > >> In
> > > > > >> > > (1),
> > > > > >> > > > > the
> > > > > >> > > > > > > > > client
> > > > > >> > > > > > > > > > > will have the old produce Id and in (2), the
> > > > client
> > > > > >> will
> > > > > >> > > have
> > > > > >> > > > > the
> > > > > >> > > > > > > new
> > > > > >> > > > > > > > > > > produce Id. If we downgrade right after the
> > > > prepare
> > > > > >> > marker,
> > > > > >> > > > we
> > > > > >> > > > > > > can't
> > > > > >> > > > > > > > be
> > > > > >> > > > > > > > > > > consistent to both (1) and (2) since we can
> > only
> > > > put
> > > > > >> one
> > > > > >> > > > value
> > > > > >> > > > > in
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > > existing produce Id field. It's also not
> clear
> > > > which
> > > > > >> case
> > > > > >> > > is
> > > > > >> > > > > more
> > > > > >> > > > > > > > > likely.
> > > > > >> > > > > > > > > > > So we could probably be consistent with
> either
> > > > case.
> > > > > >> By
> > > > > >> > > > putting
> > > > > >> > > > > > the
> > > > > >> > > > > > > > new
> > > > > >> > > > > > > > > > > producer Id in the prepare marker, we are
> > > > consistent
> > > > > >> with
> > > > > >> > > > case
> > > > > >> > > > > > (2)
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > > it
> > > > > >> > > > > > > > > > > also has the slight benefit that the produce
> > > field
> > > > > in
> > > > > >> the
> > > > > >> > > > > prepare
> > > > > >> > > > > > > and
> > > > > >> > > > > > > > > > > complete marker are consistent in the
> overflow
> > > > case.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Jun
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Fri, Jan 12, 2024 at 3:11 PM Justine
> Olshan
> > > > > >> > > > > > > > > > > <jols...@confluent.io.invalid>
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Hi Jun,
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > In the case you describe, we would need to
> > > have
> > > > a
> > > > > >> > delayed
> > > > > >> > > > > > > request,
> > > > > >> > > > > > > > > > send a
> > > > > >> > > > > > > > > > > > successful EndTxn, and a successful
> > > > > >> AddPartitionsToTxn
> > > > > >> > > and
> > > > > >> > > > > then
> > > > > >> > > > > > > > have
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > > delayed EndTxn request go through for a
> > given
> > > > > >> producer.
> > > > > >> > > > > > > > > > > > I'm trying to figure out if it is possible
> > for
> > > > the
> > > > > >> > client
> > > > > >> > > > to
> > > > > >> > > > > > > > > transition
> > > > > >> > > > > > > > > > > if
> > > > > >> > > > > > > > > > > > a previous request is delayed somewhere.
> But
> > > > yes,
> > > > > in
> > > > > >> > this
> > > > > >> > > > > case
> > > > > >> > > > > > I
> > > > > >> > > > > > > > > think
> > > > > >> > > > > > > > > > we
> > > > > >> > > > > > > > > > > > would fence the client.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Not for the overflow case. In the overflow
> > > case,
> > > > > the
> > > > > >> > > > producer
> > > > > >> > > > > > ID
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > > epoch are different on the marker and on
> the
> > > new
> > > > > >> > > > transaction.
> > > > > >> > > > > > So
> > > > > >> > > > > > > we
> > > > > >> > > > > > > > > > want
> > > > > >> > > > > > > > > > > > the marker to use the max epoch  but the
> new
> > > > > >> > transaction
> > > > > >> > > > > should
> > > > > >> > > > > > > > start
> > > > > >> > > > > > > > > > > with
> > > > > >> > > > > > > > > > > > the new ID and epoch 0 in the
> transactional
> > > > state.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > In the server downgrade case, we want to
> see
> > > the
> > > > > >> > producer
> > > > > >> > > > ID
> > > > > >> > > > > as
> > > > > >> > > > > > > > that
> > > > > >> > > > > > > > > is
> > > > > >> > > > > > > > > > > > what the client will have. If we complete
> > the
> > > > > >> commit,
> > > > > >> > and
> > > > > >> > > > the
> > > > > >> > > > > > > > > > transaction
> > > > > >> > > > > > > > > > > > state is reloaded, we need the new
> producer
> > ID
> > > > in
> > > > > >> the
> > > > > >> > > state
> > > > > >> > > > > so
> > > > > >> > > > > > > > there
> > > > > >> > > > > > > > > > > isn't
> > > > > >> > > > > > > > > > > > an invalid producer ID mapping.
> > > > > >> > > > > > > > > > > > The server downgrade cases are considering
> > > > > >> > transactional
> > > > > >> > > > > > > > correctness
> > > > > >> > > > > > > > > > and
> > > > > >> > > > > > > > > > > > not regressing from previous behavior --
> and
> > > are
> > > > > not
> > > > > >> > > > > concerned
> > > > > >> > > > > > > > about
> > > > > >> > > > > > > > > > > > supporting the safety from fencing retries
> > (as
> > > > we
> > > > > >> have
> > > > > >> > > > > > downgraded
> > > > > >> > > > > > > > so
> > > > > >> > > > > > > > > we
> > > > > >> > > > > > > > > > > > don't need to support). Perhaps this is a
> > > trade
> > > > > off,
> > > > > >> > but
> > > > > >> > > I
> > > > > >> > > > > > think
> > > > > >> > > > > > > it
> > > > > >> > > > > > > > > is
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > right one.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > (If the client downgrades, it will have
> > > > restarted
> > > > > >> and
> > > > > >> > it
> > > > > >> > > is
> > > > > >> > > > > ok
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > > > to
> > > > > >> > > > > > > > > > > > have a new producer ID too).
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Justine
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > On Fri, Jan 12, 2024 at 11:42 AM Jun Rao
> > > > > >> > > > > > > <j...@confluent.io.invalid
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Hi, Justine,
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Thanks for the reply.
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > 101.4 "If the marker is written by the
> new
> > > > > >> client, we
> > > > > >> > > can
> > > > > >> > > > > as
> > > > > >> > > > > > I
> > > > > >> > > > > > > > > > > mentioned
> > > > > >> > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > the last email guarantee that any EndTxn
> > > > > requests
> > > > > >> > with
> > > > > >> > > > the
> > > > > >> > > > > > same
> > > > > >> > > > > > > > > epoch
> > > > > >> > > > > > > > > > > are
> > > > > >> > > > > > > > > > > > > from the same producer and the same
> > > > transaction.
> > > > > >> Then
> > > > > >> > > we
> > > > > >> > > > > > don't
> > > > > >> > > > > > > > have
> > > > > >> > > > > > > > > > to
> > > > > >> > > > > > > > > > > > > return a fenced error but can handle
> > > > gracefully
> > > > > as
> > > > > >> > > > > described
> > > > > >> > > > > > in
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > > > KIP."
> > > > > >> > > > > > > > > > > > > When a delayed EndTnx request is
> > processed,
> > > > the
> > > > > >> txn
> > > > > >> > > state
> > > > > >> > > > > > could
> > > > > >> > > > > > > > be
> > > > > >> > > > > > > > > > > > ongoing
> > > > > >> > > > > > > > > > > > > for the next txn. I guess in this case
> we
> > > > still
> > > > > >> > return
> > > > > >> > > > the
> > > > > >> > > > > > > fenced
> > > > > >> > > > > > > > > > error
> > > > > >> > > > > > > > > > > > for
> > > > > >> > > > > > > > > > > > > the delayed request?
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > 102. Sorry, my question was inaccurate.
> > What
> > > > you
> > > > > >> > > > described
> > > > > >> > > > > is
> > > > > >> > > > > > > > > > accurate.
> > > > > >> > > > > > > > > > > > > "The downgrade compatibility I mention
> is
> > > that
> > > > > we
> > > > > >> > keep
> > > > > >> > > > the
> > > > > >> > > > > > same
> > > > > >> > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > ID
> > > > > >> > > > > > > > > > > > > and epoch in the main (non-tagged)
> fields
> > as
> > > > we
> > > > > >> did
> > > > > >> > > > before
> > > > > >> > > > > > the
> > > > > >> > > > > > > > code
> > > > > >> > > > > > > > > > on
> > > > > >> > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > server side." If we want to do this, it
> > > seems
> > > > > >> that we
> > > > > >> > > > > should
> > > > > >> > > > > > > use
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > > current produce Id and max epoch in the
> > > > existing
> > > > > >> > > > producerId
> > > > > >> > > > > > and
> > > > > >> > > > > > > > > > > > > producerEpoch fields for both the
> prepare
> > > and
> > > > > the
> > > > > >> > > > complete
> > > > > >> > > > > > > > marker,
> > > > > >> > > > > > > > > > > right?
> > > > > >> > > > > > > > > > > > > The downgrade can happen after the
> > complete
> > > > > >> marker is
> > > > > >> > > > > > written.
> > > > > >> > > > > > > > With
> > > > > >> > > > > > > > > > > what
> > > > > >> > > > > > > > > > > > > you described, the downgraded
> coordinator
> > > will
> > > > > see
> > > > > >> > the
> > > > > >> > > > new
> > > > > >> > > > > > > > produce
> > > > > >> > > > > > > > > Id
> > > > > >> > > > > > > > > > > > > instead of the old one.
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > Jun
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > On Fri, Jan 12, 2024 at 10:44 AM Justine
> > > > Olshan
> > > > > >> > > > > > > > > > > > > <jols...@confluent.io.invalid> wrote:
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Hi Jun,
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > I can update the description.
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > I believe your second point is
> mentioned
> > > in
> > > > > the
> > > > > >> > KIP.
> > > > > >> > > I
> > > > > >> > > > > can
> > > > > >> > > > > > > add
> > > > > >> > > > > > > > > more
> > > > > >> > > > > > > > > > > > text
> > > > > >> > > > > > > > > > > > > on
> > > > > >> > > > > > > > > > > > > > this if it is helpful.
> > > > > >> > > > > > > > > > > > > > > The delayed message case can also
> > > violate
> > > > > EOS
> > > > > >> if
> > > > > >> > > the
> > > > > >> > > > > > > delayed
> > > > > >> > > > > > > > > > > message
> > > > > >> > > > > > > > > > > > > > comes in after the next
> > addPartitionsToTxn
> > > > > >> request
> > > > > >> > > > comes
> > > > > >> > > > > > in.
> > > > > >> > > > > > > > > > > > Effectively
> > > > > >> > > > > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > > may see a message from a previous
> > > (aborted)
> > > > > >> > > transaction
> > > > > >> > > > > > > become
> > > > > >> > > > > > > > > part
> > > > > >> > > > > > > > > > > of
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > next transaction.
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > If the marker is written by the new
> > > client,
> > > > we
> > > > > >> can
> > > > > >> > > as I
> > > > > >> > > > > > > > mentioned
> > > > > >> > > > > > > > > > in
> > > > > >> > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > last email guarantee that any EndTxn
> > > > requests
> > > > > >> with
> > > > > >> > > the
> > > > > >> > > > > same
> > > > > >> > > > > > > > epoch
> > > > > >> > > > > > > > > > are
> > > > > >> > > > > > > > > > > > > from
> > > > > >> > > > > > > > > > > > > > the same producer and the same
> > > transaction.
> > > > > >> Then we
> > > > > >> > > > don't
> > > > > >> > > > > > > have
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > > > > return
> > > > > >> > > > > > > > > > > > > a
> > > > > >> > > > > > > > > > > > > > fenced error but can handle gracefully
> > as
> > > > > >> described
> > > > > >> > > in
> > > > > >> > > > > the
> > > > > >> > > > > > > KIP.
> > > > > >> > > > > > > > > > > > > > I don't think a boolean is useful
> since
> > it
> > > > is
> > > > > >> > > directly
> > > > > >> > > > > > > encoded
> > > > > >> > > > > > > > by
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > existence or lack of the tagged field
> > > being
> > > > > >> > written.
> > > > > >> > > > > > > > > > > > > > In the prepare marker we will have the
> > > same
> > > > > >> > producer
> > > > > >> > > ID
> > > > > >> > > > > in
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > > > non-tagged
> > > > > >> > > > > > > > > > > > > > field. In the Complete state we may
> not.
> > > > > >> > > > > > > > > > > > > > I'm not sure why the ongoing state
> > matters
> > > > for
> > > > > >> this
> > > > > >> > > > KIP.
> > > > > >> > > > > It
> > > > > >> > > > > > > > does
> > > > > >> > > > > > > > > > > matter
> > > > > >> > > > > > > > > > > > > for
> > > > > >> > > > > > > > > > > > > > KIP-939.
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > I'm not sure what you are referring to
> > > about
> > > > > >> > writing
> > > > > >> > > > the
> > > > > >> > > > > > > > previous
> > > > > >> > > > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > > > ID in the prepare marker. This is not
> in
> > > the
> > > > > >> KIP.
> > > > > >> > > > > > > > > > > > > > In the overflow case, we write the
> > > > > >> nextProducerId
> > > > > >> > in
> > > > > >> > > > the
> > > > > >> > > > > > > > prepare
> > > > > >> > > > > > > > > > > state.
> > > > > >> > > > > > > > > > > > > > This is so we know what we assigned
> when
> > > we
> > > > > >> reload
> > > > > >> > > the
> > > > > >> > > > > > > > > transaction
> > > > > >> > > > > > > > > > > log.
> > > > > >> > > > > > > > > > > > > > Once we complete, we transition this
> ID
> > to
> > > > the
> > > > > >> main
> > > > > >> > > > > > > (non-tagged
> > > > > >> > > > > > > > > > > field)
> > > > > >> > > > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > > have the previous producer ID field
> > filled
> > > > in.
> > > > > >> This
> > > > > >> > > is
> > > > > >> > > > so
> > > > > >> > > > > > we
> > > > > >> > > > > > > > can
> > > > > >> > > > > > > > > > > > identify
> > > > > >> > > > > > > > > > > > > > in a retry case the operation
> completed
> > > > > >> > successfully
> > > > > >> > > > and
> > > > > >> > > > > we
> > > > > >> > > > > > > > don't
> > > > > >> > > > > > > > > > > fence
> > > > > >> > > > > > > > > > > > > our
> > > > > >> > > > > > > > > > > > > > producer. The downgrade compatibility
> I
> > > > > mention
> > > > > >> is
> > > > > >> > > that
> > > > > >> > > > > we
> > > > > >> > > > > > > keep
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > same
> > > > > >> > > > > > > > > > > > > > producer ID and epoch in the main
> > > > (non-tagged)
> > > > > >> > fields
> > > > > >> > > > as
> > > > > >> > > > > we
> > > > > >> > > > > > > did
> > > > > >> > > > > > > > > > > before
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > code on the server side. If the server
> > > > > >> downgrades,
> > > > > >> > we
> > > > > >> > > > are
> > > > > >> > > > > > > still
> > > > > >> > > > > > > > > > > > > compatible.
> > > > > >> > > > > > > > > > > > > > This addresses both the prepare and
> > > complete
> > > > > >> state
> > > > > >> > > > > > > downgrades.
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > Justine
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > On Fri, Jan 12, 2024 at 10:21 AM Jun
> Rao
> > > > > >> > > > > > > > > <j...@confluent.io.invalid
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Hi, Justine,
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Thanks for the reply. Sorry for the
> > > > delay. I
> > > > > >> > have a
> > > > > >> > > > few
> > > > > >> > > > > > > more
> > > > > >> > > > > > > > > > > > comments.
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > 110. I think the motivation section
> > > could
> > > > be
> > > > > >> > > > improved.
> > > > > >> > > > > > One
> > > > > >> > > > > > > of
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > motivations listed by the KIP is
> "This
> > > can
> > > > > >> happen
> > > > > >> > > > when
> > > > > >> > > > > a
> > > > > >> > > > > > > > > message
> > > > > >> > > > > > > > > > > gets
> > > > > >> > > > > > > > > > > > > > stuck
> > > > > >> > > > > > > > > > > > > > > or delayed due to networking issues
> > or a
> > > > > >> network
> > > > > >> > > > > > partition,
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > > > > > > transaction
> > > > > >> > > > > > > > > > > > > > > aborts, and then the delayed message
> > > > finally
> > > > > >> > comes
> > > > > >> > > > > in.".
> > > > > >> > > > > > > This
> > > > > >> > > > > > > > > > seems
> > > > > >> > > > > > > > > > > > not
> > > > > >> > > > > > > > > > > > > > > very accurate. Without KIP-890,
> > > currently,
> > > > > if
> > > > > >> the
> > > > > >> > > > > > > coordinator
> > > > > >> > > > > > > > > > times
> > > > > >> > > > > > > > > > > > out
> > > > > >> > > > > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > > > aborts an ongoing transaction, it
> > > already
> > > > > >> bumps
> > > > > >> > up
> > > > > >> > > > the
> > > > > >> > > > > > > epoch
> > > > > >> > > > > > > > in
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > marker,
> > > > > >> > > > > > > > > > > > > > > which prevents the delayed produce
> > > message
> > > > > >> from
> > > > > >> > > being
> > > > > >> > > > > > added
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > > user
> > > > > >> > > > > > > > > > > > > > > partition. What can cause a hanging
> > > > > >> transaction
> > > > > >> > is
> > > > > >> > > > that
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > > > > completes (either aborts or
> commits) a
> > > > > >> > transaction
> > > > > >> > > > > before
> > > > > >> > > > > > > > > > > receiving a
> > > > > >> > > > > > > > > > > > > > > successful ack on messages published
> > in
> > > > the
> > > > > >> same
> > > > > >> > > txn.
> > > > > >> > > > > In
> > > > > >> > > > > > > this
> > > > > >> > > > > > > > > > case,
> > > > > >> > > > > > > > > > > > > it's
> > > > > >> > > > > > > > > > > > > > > possible for the delayed message to
> be
> > > > > >> appended
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > > > > > partition
> > > > > >> > > > > > > > > > > > after
> > > > > >> > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > marker, causing a transaction to
> hang.
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > A similar issue (not mentioned in
> the
> > > > > >> motivation)
> > > > > >> > > > could
> > > > > >> > > > > > > > happen
> > > > > >> > > > > > > > > on
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > marker in the coordinator's log. For
> > > > > example,
> > > > > >> > it's
> > > > > >> > > > > > possible
> > > > > >> > > > > > > > for
> > > > > >> > > > > > > > > > an
> > > > > >> > > > > > > > > > > > > > > EndTxnRequest to be delayed on the
> > > > > >> coordinator.
> > > > > >> > By
> > > > > >> > > > the
> > > > > >> > > > > > time
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > > > > delayed
> > > > > >> > > > > > > > > > > > > > > EndTxnRequest is processed, it's
> > > possible
> > > > > that
> > > > > >> > the
> > > > > >> > > > > > previous
> > > > > >> > > > > > > > txn
> > > > > >> > > > > > > > > > has
> > > > > >> > > > > > > > > > > > > > already
> > > > > >> > > > > > > > > > > > > > > completed and a new txn has started.
> > > > > >> Currently,
> > > > > >> > > since
> > > > > >> > > > > the
> > > > > >> > > > > > > > epoch
> > > > > >> > > > > > > > > > is
> > > > > >> > > > > > > > > > > > not
> > > > > >> > > > > > > > > > > > > > > bumped on every txn, the delayed
> > > > > EndTxnRequest
> > > > > >> > will
> > > > > >> > > > add
> > > > > >> > > > > > an
> > > > > >> > > > > > > > > > > unexpected
> > > > > >> > > > > > > > > > > > > > > prepare marker (and eventually a
> > > complete
> > > > > >> marker)
> > > > > >> > > to
> > > > > >> > > > > the
> > > > > >> > > > > > > > > ongoing
> > > > > >> > > > > > > > > > > txn.
> > > > > >> > > > > > > > > > > > > > This
> > > > > >> > > > > > > > > > > > > > > won't cause the transaction to hang,
> > but
> > > > it
> > > > > >> will
> > > > > >> > > > break
> > > > > >> > > > > > the
> > > > > >> > > > > > > > EoS
> > > > > >> > > > > > > > > > > > > semantic.
> > > > > >> > > > > > > > > > > > > > > The proposal in this KIP will
> address
> > > this
> > > > > >> issue
> > > > > >> > > too.
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > 101. "However, I was writing it so
> > that
> > > we
> > > > > can
> > > > > >> > > > > > distinguish
> > > > > >> > > > > > > > > > between
> > > > > >> > > > > > > > > > > > > > > old clients where we don't have the
> > > > ability
> > > > > do
> > > > > >> > this
> > > > > >> > > > > > > operation
> > > > > >> > > > > > > > > and
> > > > > >> > > > > > > > > > > new
> > > > > >> > > > > > > > > > > > > > > clients that can. (Old clients don't
> > > bump
> > > > > the
> > > > > >> > epoch
> > > > > >> > > > on
> > > > > >> > > > > > > > commit,
> > > > > >> > > > > > > > > so
> > > > > >> > > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > > can't
> > > > > >> > > > > > > > > > > > > > > say for sure the write belongs to
> the
> > > > given
> > > > > >> > > > > > transaction)."
> > > > > >> > > > > > > > > > > > > > > 101.1 I am wondering why we need to
> > > > > >> distinguish
> > > > > >> > > > whether
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > marker
> > > > > >> > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > written by the old and the new
> client.
> > > > Could
> > > > > >> you
> > > > > >> > > > > describe
> > > > > >> > > > > > > > what
> > > > > >> > > > > > > > > we
> > > > > >> > > > > > > > > > > do
> > > > > >> > > > > > > > > > > > > > > differently if we know the marker is
> > > > written
> > > > > >> by
> > > > > >> > the
> > > > > >> > > > new
> > > > > >> > > > > > > > client?
> > > > > >> > > > > > > > > > > > > > > 101.2 If we do need a way to
> > distinguish
> > > > > >> whether
> > > > > >> > > the
> > > > > >> > > > > > marker
> > > > > >> > > > > > > > is
> > > > > >> > > > > > > > > > > > written
> > > > > >> > > > > > > > > > > > > by
> > > > > >> > > > > > > > > > > > > > > the old and the new client. Would it
> > be
> > > > > >> simpler
> > > > > >> > to
> > > > > >> > > > just
> > > > > >> > > > > > > > > > introduce a
> > > > > >> > > > > > > > > > > > > > boolean
> > > > > >> > > > > > > > > > > > > > > field instead of indirectly through
> > the
> > > > > >> previous
> > > > > >> > > > > produce
> > > > > >> > > > > > ID
> > > > > >> > > > > > > > > > field?
> > > > > >> > > > > > > > > > > > > > > 101.3 It's not clear to me why we
> only
> > > add
> > > > > the
> > > > > >> > > > previous
> > > > > >> > > > > > > > produce
> > > > > >> > > > > > > > > > ID
> > > > > >> > > > > > > > > > > > > field
> > > > > >> > > > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > the complete marker, but not in the
> > > > prepare
> > > > > >> > marker.
> > > > > >> > > > If
> > > > > >> > > > > we
> > > > > >> > > > > > > > want
> > > > > >> > > > > > > > > to
> > > > > >> > > > > > > > > > > > know
> > > > > >> > > > > > > > > > > > > > > whether a marker is written by the
> new
> > > > > client
> > > > > >> or
> > > > > >> > > not,
> > > > > >> > > > > it
> > > > > >> > > > > > > > seems
> > > > > >> > > > > > > > > > that
> > > > > >> > > > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > > want
> > > > > >> > > > > > > > > > > > > > > to do this consistently for all
> > markers.
> > > > > >> > > > > > > > > > > > > > > 101.4 What about the
> > TransactionLogValue
> > > > > >> record
> > > > > >> > > > > > > representing
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > > ongoing
> > > > > >> > > > > > > > > > > > > > > state? Should we also distinguish
> > > whether
> > > > > it's
> > > > > >> > > > written
> > > > > >> > > > > by
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > old
> > > > > >> > > > > > > > > > > or
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > new client?
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > 102. In the overflow case, it's
> still
> > > not
> > > > > >> clear
> > > > > >> > to
> > > > > >> > > me
> > > > > >> > > > > why
> > > > > >> > > > > > > we
> > > > > >> > > > > > > > > > write
> > > > > >> > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > previous produce Id in the prepare
> > > marker
> > > > > >> while
> > > > > >> > > > writing
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > next
> > > > > >> > > > > > > > > > > > > produce
> > > > > >> > > > > > > > > > > > > > Id
> > > > > >> > > > > > > > > > > > > > > in the complete marker. You
> mentioned
> > > that
> > > > > >> it's
> > > > > >> > for
> > > > > >> > > > > > > > > downgrading.
> > > > > >> > > > > > > > > > > > > However,
> > > > > >> > > > > > > > > > > > > > > we could downgrade with either the
> > > prepare
> > > > > >> marker
> > > > > >> > > or
> > > > > >> > > > > the
> > > > > >> > > > > > > > > complete
> > > > > >> > > > > > > > > > > > > marker.
> > > > > >> > > > > > > > > > > > > > > In either case, the downgraded
> > > coordinator
> > > > > >> should
> > > > > >> > > see
> > > > > >> > > > > the
> > > > > >> > > > > > > > same
> > > > > >> > > > > > > > > > > > produce
> > > > > >> > > > > > > > > > > > > id
> > > > > >> > > > > > > > > > > > > > > (probably the previous produce Id),
> > > right?
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > Jun
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at 6:00 PM
> > Justine
> > > > > Olshan
> > > > > >> > > > > > > > > > > > > > > <jols...@confluent.io.invalid>
> > > > > >> > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Hey Jun,
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Thanks for taking a look at the
> KIP
> > > > again.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > 100. For the epoch overflow case,
> > only
> > > > the
> > > > > >> > marker
> > > > > >> > > > > will
> > > > > >> > > > > > > have
> > > > > >> > > > > > > > > max
> > > > > >> > > > > > > > > > > > > epoch.
> > > > > >> > > > > > > > > > > > > > > This
> > > > > >> > > > > > > > > > > > > > > > keeps the behavior of the rest of
> > the
> > > > > >> markers
> > > > > >> > > where
> > > > > >> > > > > the
> > > > > >> > > > > > > > last
> > > > > >> > > > > > > > > > > marker
> > > > > >> > > > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > epoch of the transaction records +
> > 1.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > 101. You are correct that we don't
> > > need
> > > > to
> > > > > >> > write
> > > > > >> > > > the
> > > > > >> > > > > > > > producer
> > > > > >> > > > > > > > > > ID
> > > > > >> > > > > > > > > > > > > since
> > > > > >> > > > > > > > > > > > > > it
> > > > > >> > > > > > > > > > > > > > > > is the same. However, I was
> writing
> > it
> > > > so
> > > > > >> that
> > > > > >> > we
> > > > > >> > > > can
> > > > > >> > > > > > > > > > distinguish
> > > > > >> > > > > > > > > > > > > > between
> > > > > >> > > > > > > > > > > > > > > > old clients where we don't have
> the
> > > > > ability
> > > > > >> do
> > > > > >> > > this
> > > > > >> > > > > > > > operation
> > > > > >> > > > > > > > > > and
> > > > > >> > > > > > > > > > > > new
> > > > > >> > > > > > > > > > > > > > > > clients that can. (Old clients
> don't
> > > > bump
> > > > > >> the
> > > > > >> > > epoch
> > > > > >> > > > > on
> > > > > >> > > > > > > > > commit,
> > > > > >> > > > > > > > > > so
> > > > > >> > > > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > > > can't
> > > > > >> > > > > > > > > > > > > > > > say for sure the write belongs to
> > the
> > > > > given
> > > > > >> > > > > > transaction).
> > > > > >> > > > > > > > If
> > > > > >> > > > > > > > > we
> > > > > >> > > > > > > > > > > > > receive
> > > > > >> > > > > > > > > > > > > > > an
> > > > > >> > > > > > > > > > > > > > > > EndTxn request from a new client,
> we
> > > > will
> > > > > >> fill
> > > > > >> > > this
> > > > > >> > > > > > > field.
> > > > > >> > > > > > > > We
> > > > > >> > > > > > > > > > can
> > > > > >> > > > > > > > > > > > > > > guarantee
> > > > > >> > > > > > > > > > > > > > > > that any EndTxn requests with the
> > same
> > > > > epoch
> > > > > >> > are
> > > > > >> > > > from
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > same
> > > > > >> > > > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > > > > the same transaction.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > 102. In prepare phase, we have the
> > > same
> > > > > >> > producer
> > > > > >> > > ID
> > > > > >> > > > > and
> > > > > >> > > > > > > > epoch
> > > > > >> > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > always
> > > > > >> > > > > > > > > > > > > > > > had. It is the producer ID and
> epoch
> > > > that
> > > > > >> are
> > > > > >> > on
> > > > > >> > > > the
> > > > > >> > > > > > > > marker.
> > > > > >> > > > > > > > > In
> > > > > >> > > > > > > > > > > > > commit
> > > > > >> > > > > > > > > > > > > > > > phase, we stay the same unless it
> is
> > > the
> > > > > >> > overflow
> > > > > >> > > > > case.
> > > > > >> > > > > > > In
> > > > > >> > > > > > > > > that
> > > > > >> > > > > > > > > > > > case,
> > > > > >> > > > > > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > > > > set the producer ID to the new one
> > we
> > > > > >> generated
> > > > > >> > > and
> > > > > >> > > > > > epoch
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > 0
> > > > > >> > > > > > > > > > > > after
> > > > > >> > > > > > > > > > > > > > > > complete. This is for downgrade
> > > > > >> compatibility.
> > > > > >> > > The
> > > > > >> > > > > > tagged
> > > > > >> > > > > > > > > > fields
> > > > > >> > > > > > > > > > > > are
> > > > > >> > > > > > > > > > > > > > just
> > > > > >> > > > > > > > > > > > > > > > safety guards for retries and
> > > failovers.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > In prepare phase for epoch
> overflow
> > > case
> > > > > >> only
> > > > > >> > we
> > > > > >> > > > > store
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > next
> > > > > >> > > > > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > > > > > ID. This is for the case where we
> > > reload
> > > > > the
> > > > > >> > > > > > transaction
> > > > > >> > > > > > > > > > > > coordinator
> > > > > >> > > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > > prepare state. Once the
> transaction
> > is
> > > > > >> > committed,
> > > > > >> > > > we
> > > > > >> > > > > > can
> > > > > >> > > > > > > > use
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > > > > > ID the client already is using.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > In commit phase, we store the
> > previous
> > > > > >> producer
> > > > > >> > > ID
> > > > > >> > > > in
> > > > > >> > > > > > > case
> > > > > >> > > > > > > > of
> > > > > >> > > > > > > > > > > > > retries.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > I think it is easier to think of
> it
> > as
> > > > > just
> > > > > >> how
> > > > > >> > > we
> > > > > >> > > > > were
> > > > > >> > > > > > > > > storing
> > > > > >> > > > > > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > > > > ID
> > > > > >> > > > > > > > > > > > > > > > and epoch before, with some extra
> > > > > bookeeping
> > > > > >> > and
> > > > > >> > > > edge
> > > > > >> > > > > > > case
> > > > > >> > > > > > > > > > > handling
> > > > > >> > > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > tagged fields. We have to do it
> this
> > > way
> > > > > for
> > > > > >> > > > > > > compatibility
> > > > > >> > > > > > > > > with
> > > > > >> > > > > > > > > > > > > > > downgrades.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > 103. Next producer ID is for
> prepare
> > > > > status
> > > > > >> and
> > > > > >> > > > > > previous
> > > > > >> > > > > > > > > > producer
> > > > > >> > > > > > > > > > > > ID
> > > > > >> > > > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > for
> > > > > >> > > > > > > > > > > > > > > > after complete. The reason why we
> > need
> > > > two
> > > > > >> > > separate
> > > > > >> > > > > > > > (tagged)
> > > > > >> > > > > > > > > > > fields
> > > > > >> > > > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > for
> > > > > >> > > > > > > > > > > > > > > > backwards compatibility. We need
> to
> > > keep
> > > > > the
> > > > > >> > same
> > > > > >> > > > > > > semantics
> > > > > >> > > > > > > > > for
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > non-tagged field in case we
> > downgrade.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > 104. We set the fields as we do in
> > the
> > > > > >> > > > transactional
> > > > > >> > > > > > > state
> > > > > >> > > > > > > > > (as
> > > > > >> > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > need
> > > > > >> > > > > > > > > > > > > > to
> > > > > >> > > > > > > > > > > > > > > > do this for compatibility -- if we
> > > > > >> downgrade,
> > > > > >> > we
> > > > > >> > > > will
> > > > > >> > > > > > > only
> > > > > >> > > > > > > > > have
> > > > > >> > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > non-tagged fields) It will be the
> > old
> > > > > >> producer
> > > > > >> > ID
> > > > > >> > > > and
> > > > > >> > > > > > max
> > > > > >> > > > > > > > > > epoch.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Hope this helps. Let me know if
> you
> > > have
> > > > > >> > further
> > > > > >> > > > > > > questions.
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > Justine
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at 3:33 PM
> Jun
> > > Rao
> > > > > >> > > > > > > > > > <j...@confluent.io.invalid
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > Hi, Justine,
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > It seems that you have made some
> > > > changes
> > > > > >> to
> > > > > >> > > > KIP-890
> > > > > >> > > > > > > since
> > > > > >> > > > > > > > > the
> > > > > >> > > > > > > > > > > > vote.
> > > > > >> > > > > > > > > > > > > > In
> > > > > >> > > > > > > > > > > > > > > > > particular, we are changing the
> > > format
> > > > > of
> > > > > >> > > > > > > > > > TransactionLogValue.
> > > > > >> > > > > > > > > > > A
> > > > > >> > > > > > > > > > > > > few
> > > > > >> > > > > > > > > > > > > > > > > comments related to that.
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > 100. Just to be clear. The
> > overflow
> > > > case
> > > > > >> > (i.e.
> > > > > >> > > > > when a
> > > > > >> > > > > > > new
> > > > > >> > > > > > > > > > > > > producerId
> > > > > >> > > > > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > > > > > generated) is when the current
> > epoch
> > > > > >> equals
> > > > > >> > to
> > > > > >> > > > max
> > > > > >> > > > > -
> > > > > >> > > > > > 1
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > > > not
> > > > > >> > > > > > > > > > > > max?
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > 101. For the "not epoch
> overflow"
> > > > case,
> > > > > we
> > > > > >> > > write
> > > > > >> > > > > the
> > > > > >> > > > > > > > > previous
> > > > > >> > > > > > > > > > > ID
> > > > > >> > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > tagged field in the complete
> > phase.
> > > Do
> > > > > we
> > > > > >> > need
> > > > > >> > > to
> > > > > >> > > > > do
> > > > > >> > > > > > > that
> > > > > >> > > > > > > > > > since
> > > > > >> > > > > > > > > > > > > > produce
> > > > > >> > > > > > > > > > > > > > > > id
> > > > > >> > > > > > > > > > > > > > > > > doesn't change in this case?
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > 102. It seems that the meaning
> for
> > > the
> > > > > >> > > > > > > > > > ProducerId/ProducerEpoch
> > > > > >> > > > > > > > > > > > > > fields
> > > > > >> > > > > > > > > > > > > > > in
> > > > > >> > > > > > > > > > > > > > > > > TransactionLogValue changes
> > > depending
> > > > on
> > > > > >> the
> > > > > >> > > > > > > > > > TransactionStatus.
> > > > > >> > > > > > > > > > > > > When
> > > > > >> > > > > > > > > > > > > > > > > the TransactionStatus is
> ongoing,
> > > they
> > > > > >> > > represent
> > > > > >> > > > > the
> > > > > >> > > > > > > > > current
> > > > > >> > > > > > > > > > > > > > ProducerId
> > > > > >> > > > > > > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > > > > > the current ProducerEpoch. When
> > the
> > > > > >> > > > > TransactionStatus
> > > > > >> > > > > > > is
> > > > > >> > > > > > > > > > > > > > > > > PrepareCommit/PrepareAbort, they
> > > > > represent
> > > > > >> > the
> > > > > >> > > > > > current
> > > > > >> > > > > > > > > > > ProducerId
> > > > > >> > > > > > > > > > > > > and
> > > > > >> > > > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > next ProducerEpoch. When the
> > > > > >> > TransactionStatus
> > > > > >> > > is
> > > > > >> > > > > > > > > > Commit/Abort,
> > > > > >> > > > > > > > > > > > > they
> > > > > >> > > > > > > > > > > > > > > > > further depend on whether the
> > epoch
> > > > > >> overflows
> > > > > >> > > or
> > > > > >> > > > > not.
> > > > > >> > > > > > > If
> > > > > >> > > > > > > > > > there
> > > > > >> > > > > > > > > > > is
> > > > > >> > > > > > > > > > > > > no
> > > > > >> > > > > > > > > > > > > > > > > overflow, they represent  the
> > > current
> > > > > >> > > ProducerId
> > > > > >> > > > > and
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > next
> > > > > >> > > > > > > > > > > > > > > > ProducerEpoch
> > > > > >> > > > > > > > > > > > > > > > > (max). Otherwise, they represent
> > the
> > > > > newly
> > > > > >> > > > > generated
> > > > > >> > > > > > > > > > ProducerId
> > > > > >> > > > > > > > > > > > > and a
> > > > > >> > > > > > > > > > > > > > > > > ProducerEpoch of 0. Is that
> right?
> > > > This
> > > > > >> seems
> > > > > >> > > not
> > > > > >> > > > > > easy
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > > > > > understand.
> > > > > >> > > > > > > > > > > > > > > > Could
> > > > > >> > > > > > > > > > > > > > > > > we provide some examples like
> what
> > > > Artem
> > > > > >> has
> > > > > >> > > done
> > > > > >> > > > > in
> > > > > >> > > > > > > > > KIP-939?
> > > > > >> > > > > > > > > > > > Have
> > > > > >> > > > > > > > > > > > > we
> > > > > >> > > > > > > > > > > > > > > > > considered a simpler design
> where
> > > > > >> > > > > > > > ProducerId/ProducerEpoch
> > > > > >> > > > > > > > > > > always
> > > > > >> > > > > > > > > > > > > > > > represent
> > > > > >> > > > > > > > > > > > > > > > > the same value (e.g. for the
> > current
> > > > > >> > > transaction)
> > > > > >> > > > > > > > > independent
> > > > > >> > > > > > > > > > > of
> > > > > >> > > > > > > > > > > > > the
> > > > > >> > > > > > > > > > > > > > > > > TransactionStatus and epoch
> > > overflow?
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > 103. It's not clear to me why we
> > > need
> > > > 3
> > > > > >> > fields:
> > > > > >> > > > > > > > ProducerId,
> > > > > >> > > > > > > > > > > > > > > > PrevProducerId,
> > > > > >> > > > > > > > > > > > > > > > > NextProducerId. Could we just
> have
> > > > > >> ProducerId
> > > > > >> > > and
> > > > > >> > > > > > > > > > > NextProducerId?
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > 104. For WriteTxnMarkerRequests,
> > if
> > > > the
> > > > > >> > > producer
> > > > > >> > > > > > epoch
> > > > > >> > > > > > > > > > > overflows,
> > > > > >> > > > > > > > > > > > > > what
> > > > > >> > > > > > > > > > > > > > > do
> > > > > >> > > > > > > > > > > > > > > > > we set the producerId and the
> > > > > >> producerEpoch?
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > Thanks,
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > > > Jun
> > > > > >> > > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to