Re: [DISCUSS] KIP-939: Support Participation in 2PC

Rowland Smith Tue, 06 Feb 2024 19:35:19 -0800

Hi Artem,

I am not sure what you mean by guarantee, but I am referring to a better
operational experience. You mentioned this as the first benefit of an
explicit "prepare" RPC in the KIP.



1. Transactions that haven’t reached “prepared” state can be aborted via
timeout.

However, in explaining why an explicit "prepare" RPC was not included in
the design, you make no further mention of this benefit. So what I am
saying is this benefit is quite significant operationally. Many client
application failures may occur before the transaction reaches the prepared
state, and the ability to automatically abort those transactions and
unblock affected partitions without administrative intervention or fast
restart of the client would be a worthwhile benefit. An explicit "prepare"
RPC will also be needed by the XA implementation, so I would like to see it
implemented for that reason. Otherwise, I will need to add this work to my
KIP.

- Rowland

On Mon, Feb 5, 2024 at 9:35 PM Artem Livshits
<alivsh...@confluent.io.invalid> wrote:

> Hi Rowland,
>
> Thank you for your reply.  I think I understand what you're saying and just
> tried to provide a quick summary.  The
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> actually goes into the details on what would be the benefits of adding an
> explicit prepare RPC and why those won't really add any advantages such as
> elimination the needs for monitoring, tooling or providing additional
> guarantees.  Let me know if you think of a guarantee that prepare RPC would
> provide.
>
> -Artem
>
> On Mon, Feb 5, 2024 at 6:22 PM Rowland Smith <rowl...@gmail.com> wrote:
>
> > Hi Artem,
> >
> > I don't think that you understand what I am saying. In any transaction,
> > there is work done before the call to prepareTranscation() and work done
> > afterwards. Any work performed before the call to prepareTransaction()
> can
> > be aborted after a relatively short timeout if the client fails. It is
> only
> > after the prepareTransaction() call that a transaction becomes in-doubt
> and
> > must be remembered for a much longer period of time to allow the client
> to
> > recover and make the decision to either commit or abort. A considerable
> > amount of time might be spent before prepareTransaction() is called, and
> if
> > the client fails in this period, relatively quick transaction abort would
> > unblock any partitions and make the system fully available. So a prepare
> > RPC would reduce the window where a client failure results in potentially
> > long-lived blocking transactions.
> >
> > Here is the proposed sequence from the KIP with 2 added steps (4 and 5):
> >
> >
> >    1. Begin database transaction
> >    2. Begin Kafka transaction
> >    3. Produce data to Kafka
> >    4. Make updates to the database
> >    5. Repeat steps 3 and 4 as many times as necessary based on
> application
> >    needs.
> >    6. Prepare Kafka transaction [currently implicit operation, expressed
> as
> >    flush]
> >    7. Write produced data to the database
> >    8. Write offsets of produced data to the database
> >    9. Commit database transaction
> >    10. Commit Kafka transaction
> >
> >
> > If the client application crashes before step 6, it is safe to abort the
> > Kafka transaction after a relatively short timeout.
> >
> > I fully agree with a layered approach. However, the XA layer is going to
> > require certain capabilities from the layer below it, and one of those
> > capabilities is to be able to identify and report prepared transactions
> > during recovery.
> >
> > - Rowland
> >
> > On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits
> > <alivsh...@confluent.io.invalid> wrote:
> >
> > > Hi Rowland,
> > >
> > > Thank you for your feedback.  Using an explicit prepare RPC was
> discussed
> > > and is listed in the rejected alternatives:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> > > .
> > > Basically, even if we had an explicit prepare RPC, it doesn't avoid the
> > > fact that a crashed client could cause a blocking transaction.  This
> is,
> > > btw, is not just a specific property of this concrete proposal, it's a
> > > fundamental trade off of any form of 2PC -- any 2PC implementation must
> > > allow for infinitely "in-doubt" transactions that may not be
> unilaterally
> > > automatically resolved within one participant.
> > >
> > > To mitigate the issue, using 2PC requires a special permission, so that
> > the
> > > Kafka admin could control that only applications that follow proper
> > > standards in terms of availability (i.e. will automatically restart and
> > > cleanup after a crash) would be allowed to utilize 2PC.  It is also
> > assumed
> > > that any practical deployment utilizing 2PC would have monitoring set
> up,
> > > so that an operator could be alerted to investigate and manually
> resolve
> > > in-doubt transactions (the metric and tooling support for doing so are
> > also
> > > described in the KIP).
> > >
> > > For XA support, I wonder if we could take a layered approach and store
> XA
> > > information in a separate store, say in a compacted topic.  This way,
> the
> > > core Kafka protocol could be decoupled from specific implementations
> (and
> > > extra performance requirements that a specific implementation may
> impose)
> > > and serve as a foundation for multiple implementations.
> > >
> > > -Artem
> > >
> > > On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith <rowl...@gmail.com>
> wrote:
> > >
> > > > Hi Artem,
> > > >
> > > > It has been a while, but I have gotten back to this. I understand
> that
> > > when
> > > > 2PC is used, the transaction timeout will be effectively infinite. I
> > > don't
> > > > think that this behavior is desirable. A long running transaction can
> > be
> > > > extremely disruptive since it blocks consumers on any partitions
> > written
> > > to
> > > > within the pending transaction. The primary reason for a long running
> > > > transaction is a failure of the client, or the network connecting the
> > > > client to the broker. If such a failure occurs before the client
> calls
> > > > the new prepareTransaction() method, it should be OK to abort the
> > > > transaction after a relatively short timeout period. This approach
> > would
> > > > minimize the inconvenience and disruption of a long running
> transaction
> > > > blocking consumers, and provide higher availability for a system
> using
> > > > Kafka.
> > > >
> > > > In order to achieve this behavior, I think we would need a 'prepare'
> > RPC
> > > > call so that the server knows that a transaction has been prepared,
> and
> > > > does not timeout and abort such transactions. There will be some cost
> > to
> > > > this extra RPC call, but there will also be a benefit of better
> system
> > > > availability in case of failures.
> > > >
> > > > There is another reason why I would prefer this implementation. I am
> > > > working on an XA KIP, and XA requires that Kafka brokers be able to
> > > provide
> > > > a list of prepared transactions during recovery.  The broker can only
> > > know
> > > > that a transaction has been prepared if an RPC call is made., so my
> KIP
> > > > will need this functionality. In the XA KIP, I would like to use as
> > much
> > > of
> > > > the KIP-939 solution as possible, so it would be helpful if
> > > > prepareTransactions() sent a 'prepare' RPC, and the broker recorded
> the
> > > > prepared transaction state.
> > > >
> > > > This could be made configurable behavior if we are concerned that the
> > > cost
> > > > of the extra RPC call is too much, and that some users would prefer
> to
> > > have
> > > > speed in exchange for less system availability in some cases of
> client
> > or
> > > > network failure.
> > > >
> > > > Let me know what you think.
> > > >
> > > > -Rowland
> > > >
> > > > On Fri, Jan 5, 2024 at 8:03 PM Artem Livshits
> > > > <alivsh...@confluent.io.invalid> wrote:
> > > >
> > > > > Hi Rowland,
> > > > >
> > > > > Thank you for the feedback.  For the 2PC cases, the expectation is
> > that
> > > > the
> > > > > timeout on the client would be set to "effectively infinite", that
> > > would
> > > > > exceed all practical 2PC delays.  Now I think that this flexibility
> > is
> > > > > confusing and can be misused, I have updated the KIP to just say
> that
> > > if
> > > > > 2PC is used, the transaction never expires.
> > > > >
> > > > > -Artem
> > > > >
> > > > > On Thu, Jan 4, 2024 at 6:14 PM Rowland Smith <rowl...@gmail.com>
> > > wrote:
> > > > >
> > > > > > It is probably me. I copied the original message subject into a
> new
> > > > > email.
> > > > > > Perhaps that is not enough to link them.
> > > > > >
> > > > > > It was not my understanding from reading KIP-939 that we are
> doing
> > > away
> > > > > > with any transactional timeout in the Kafka broker. As I
> understand
> > > it,
> > > > > we
> > > > > > are allowing the application to set the transaction timeout to a
> > > value
> > > > > that
> > > > > > exceeds the *transaction.max.timeout.ms
> > > > > > <http://transaction.max.timeout.ms>* setting
> > > > > > on the broker, and having no timeout if the application does not
> > set
> > > > > > *transaction.timeout.ms
> > > > > > <http://transaction.timeout.ms>* on the producer. The KIP says
> > that
> > > > the
> > > > > > semantics of *transaction.timeout.ms <
> > http://transaction.timeout.ms
> > > >*
> > > > > are
> > > > > > not being changed, so I take that to mean that the broker will
> > > continue
> > > > > to
> > > > > > enforce a timeout if provided, and abort transactions that exceed
> > it.
> > > > > From
> > > > > > the KIP:
> > > > > >
> > > > > > Client Configuration Changes
> > > > > >
> > > > > > *transaction.two.phase.commit.enable* The default would be
> ‘false’.
> > > If
> > > > > set
> > > > > > to ‘true’, then the broker is informed that the client is
> > > participating
> > > > > in
> > > > > > two phase commit protocol and can set transaction timeout to
> values
> > > > that
> > > > > > exceed *transaction.max.timeout.ms <
> > > http://transaction.max.timeout.ms
> > > > >*
> > > > > > setting
> > > > > > on the broker (if the timeout is not set explicitly on the client
> > and
> > > > the
> > > > > > two phase commit is set to ‘true’ then the transaction never
> > > expires).
> > > > > >
> > > > > > *transaction.timeout.ms <http://transaction.timeout.ms>* The
> > > semantics
> > > > > is
> > > > > > not changed, but it can be set to values that exceed
> > > > > > *transaction.max.timeout.ms
> > > > > > <http://transaction.max.timeout.ms>* if two.phase.commit.enable
> is
> > > set
> > > > > to
> > > > > > ‘true’.
> > > > > >
> > > > > >
> > > > > > Thinking about this more I believe we would also have a possible
> > race
> > > > > > condition if the broker is unaware that a transaction has been
> > > > prepared.
> > > > > > The application might call prepare and get a positive response,
> but
> > > the
> > > > > > broker might have already aborted the transaction for exceeding
> the
> > > > > > timeout. It is a general rule of 2PC that once a transaction has
> > been
> > > > > > prepared it must be possible for it to be committed or aborted.
> It
> > > > seems
> > > > > in
> > > > > > this case a prepared transaction might already be aborted by the
> > > > broker,
> > > > > so
> > > > > > it would be impossible to commit.
> > > > > >
> > > > > > I hope this is making sense and I am not misunderstanding the
> KIP.
> > > > Please
> > > > > > let me know if I am.
> > > > > >
> > > > > > - Rowland
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 4, 2024 at 12:56 PM Justine Olshan
> > > > > > <jols...@confluent.io.invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Rowland,
> > > > > > >
> > > > > > > Not sure why this message showed up in a different thread from
> > the
> > > > > other
> > > > > > > KIP-939 discussion (is it just me?)
> > > > > > >
> > > > > > > In KIP-939, we do away with having any transactional timeout on
> > the
> > > > > Kafka
> > > > > > > side. The external coordinator is fully responsible for
> > controlling
> > > > > > whether
> > > > > > > the transaction completes.
> > > > > > >
> > > > > > > While I think there is some use in having a prepare stage, I
> just
> > > > > wanted
> > > > > > to
> > > > > > > clarify what the current KIP is proposing.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Justine
> > > > > > >
> > > > > > > On Wed, Jan 3, 2024 at 7:49 PM Rowland Smith <
> rowl...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Artem,
> > > > > > > >
> > > > > > > > I saw your response in the thread I started discussing Kafka
> > > > > > distributed
> > > > > > > > transaction support and the XA interface. I would like to
> work
> > > with
> > > > > you
> > > > > > > to
> > > > > > > > add XA support to Kafka on top of the excellent foundational
> > work
> > > > > that
> > > > > > > you
> > > > > > > > have started with KIP-939. I agree that explicit XA support
> > > should
> > > > > not
> > > > > > be
> > > > > > > > included in the Kafka codebase as long as the right set of
> > basic
> > > > > > > operations
> > > > > > > > are provided. I will begin pulling together a KIP to follow
> > > > KIP-939.
> > > > > > > >
> > > > > > > > I did have one comment on KIP-939 itself. I see that you
> > > considered
> > > > > an
> > > > > > > > explicit "prepare" RPC, but decided not to add it. If I
> > > understand
> > > > > your
> > > > > > > > design correctly, that would mean that a 2PC transaction
> would
> > > > have a
> > > > > > > > single timeout that would need to be long enough to ensure
> that
> > > > > > prepared
> > > > > > > > transactions are not aborted when an external coordinator
> > fails.
> > > > > > However,
> > > > > > > > this also means that an unprepared transaction would not be
> > > aborted
> > > > > > > without
> > > > > > > > waiting for the same timeout. Since long running transactions
> > > block
> > > > > > > > transactional consumers, having a long timeout for all
> > > transactions
> > > > > > could
> > > > > > > > be disruptive. An explicit "prepare " RPC would allow the
> > server
> > > to
> > > > > > abort
> > > > > > > > unprepared transactions after a relatively short timeout, and
> > > > apply a
> > > > > > > much
> > > > > > > > longer timeout only to prepared transactions. The explicit
> > > > "prepare"
> > > > > > RPC
> > > > > > > > would make Kafka server more resilient to client failure at
> the
> > > > cost
> > > > > of
> > > > > > > an
> > > > > > > > extra synchronous RPC call. I think its worth reconsidering
> > this.
> > > > > > > >
> > > > > > > > With an XA implementation this might become a more
> significant
> > > > issue
> > > > > > > since
> > > > > > > > the transaction coordinator has no memory of unprepared
> > > > transactions
> > > > > > > across
> > > > > > > > restarts. Such transactions would need to be cleared by hand
> > > > through
> > > > > > the
> > > > > > > > admin client even when the transaction coordinator restarts
> > > > > > successfully.
> > > > > > > >
> > > > > > > > - Rowland
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > *Rowland E. Smith*
> > > > P: (862) 260-4163
> > > > M: (201) 396-3842
> > > >
> > >
> >
>


-- 
*Rowland E. Smith*
P: (862) 260-4163
M: (201) 396-3842

Re: [DISCUSS] KIP-939: Support Participation in 2PC

Reply via email to