Re: [DISCUSS] KIP-939: Support Participation in 2PC

Artem Livshits Fri, 16 Feb 2024 16:15:33 -0800

Hi Rowland,

> I am not sure what you mean by guarantee,


A guarantee would be an elimination of complexity or a condition.  E.g. if
adding an explicit prepare RPC eliminated in-doubt transactions, or
eliminated a significant complexity in implementation.

> 1. Transactions that haven’t reached “prepared” state can be aborted via
timeout.

The argument is that it doesn't eliminate any conditions, it merely reduces
a subset of circumstances for the conditions to happen, but the conditions
still happen and must be handled.  The operator still needs to set up
monitoring for run-away transactions, there still needs to be an
"out-of-band" channel to resolve run-away transactions (i.e. the operation
would need a way that's not a part of the 2PC protocol to reconcile with
the application owner), there still needs to be tooling for resolving
run-away transactions.

On the downside, an explicit prepare RPC would have a performance hit on
the happy path in every single transaction.

-Artem

On Tue, Feb 6, 2024 at 7:35 PM Rowland Smith <rowl...@gmail.com> wrote:

> Hi Artem,
>
> I am not sure what you mean by guarantee, but I am referring to a better
> operational experience. You mentioned this as the first benefit of an
> explicit "prepare" RPC in the KIP.
>
>
> 1. Transactions that haven’t reached “prepared” state can be aborted via
> timeout.
>
> However, in explaining why an explicit "prepare" RPC was not included in
> the design, you make no further mention of this benefit. So what I am
> saying is this benefit is quite significant operationally. Many client
> application failures may occur before the transaction reaches the prepared
> state, and the ability to automatically abort those transactions and
> unblock affected partitions without administrative intervention or fast
> restart of the client would be a worthwhile benefit. An explicit "prepare"
> RPC will also be needed by the XA implementation, so I would like to see it
> implemented for that reason. Otherwise, I will need to add this work to my
> KIP.
>
> - Rowland
>
> On Mon, Feb 5, 2024 at 9:35 PM Artem Livshits
> <alivsh...@confluent.io.invalid> wrote:
>
> > Hi Rowland,
> >
> > Thank you for your reply.  I think I understand what you're saying and
> just
> > tried to provide a quick summary.  The
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> > actually goes into the details on what would be the benefits of adding an
> > explicit prepare RPC and why those won't really add any advantages such
> as
> > elimination the needs for monitoring, tooling or providing additional
> > guarantees.  Let me know if you think of a guarantee that prepare RPC
> would
> > provide.
> >
> > -Artem
> >
> > On Mon, Feb 5, 2024 at 6:22 PM Rowland Smith <rowl...@gmail.com> wrote:
> >
> > > Hi Artem,
> > >
> > > I don't think that you understand what I am saying. In any transaction,
> > > there is work done before the call to prepareTranscation() and work
> done
> > > afterwards. Any work performed before the call to prepareTransaction()
> > can
> > > be aborted after a relatively short timeout if the client fails. It is
> > only
> > > after the prepareTransaction() call that a transaction becomes in-doubt
> > and
> > > must be remembered for a much longer period of time to allow the client
> > to
> > > recover and make the decision to either commit or abort. A considerable
> > > amount of time might be spent before prepareTransaction() is called,
> and
> > if
> > > the client fails in this period, relatively quick transaction abort
> would
> > > unblock any partitions and make the system fully available. So a
> prepare
> > > RPC would reduce the window where a client failure results in
> potentially
> > > long-lived blocking transactions.
> > >
> > > Here is the proposed sequence from the KIP with 2 added steps (4 and
> 5):
> > >
> > >
> > >    1. Begin database transaction
> > >    2. Begin Kafka transaction
> > >    3. Produce data to Kafka
> > >    4. Make updates to the database
> > >    5. Repeat steps 3 and 4 as many times as necessary based on
> > application
> > >    needs.
> > >    6. Prepare Kafka transaction [currently implicit operation,
> expressed
> > as
> > >    flush]
> > >    7. Write produced data to the database
> > >    8. Write offsets of produced data to the database
> > >    9. Commit database transaction
> > >    10. Commit Kafka transaction
> > >
> > >
> > > If the client application crashes before step 6, it is safe to abort
> the
> > > Kafka transaction after a relatively short timeout.
> > >
> > > I fully agree with a layered approach. However, the XA layer is going
> to
> > > require certain capabilities from the layer below it, and one of those
> > > capabilities is to be able to identify and report prepared transactions
> > > during recovery.
> > >
> > > - Rowland
> > >
> > > On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits
> > > <alivsh...@confluent.io.invalid> wrote:
> > >
> > > > Hi Rowland,
> > > >
> > > > Thank you for your feedback.  Using an explicit prepare RPC was
> > discussed
> > > > and is listed in the rejected alternatives:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC
> > > > .
> > > > Basically, even if we had an explicit prepare RPC, it doesn't avoid
> the
> > > > fact that a crashed client could cause a blocking transaction.  This
> > is,
> > > > btw, is not just a specific property of this concrete proposal, it's
> a
> > > > fundamental trade off of any form of 2PC -- any 2PC implementation
> must
> > > > allow for infinitely "in-doubt" transactions that may not be
> > unilaterally
> > > > automatically resolved within one participant.
> > > >
> > > > To mitigate the issue, using 2PC requires a special permission, so
> that
> > > the
> > > > Kafka admin could control that only applications that follow proper
> > > > standards in terms of availability (i.e. will automatically restart
> and
> > > > cleanup after a crash) would be allowed to utilize 2PC.  It is also
> > > assumed
> > > > that any practical deployment utilizing 2PC would have monitoring set
> > up,
> > > > so that an operator could be alerted to investigate and manually
> > resolve
> > > > in-doubt transactions (the metric and tooling support for doing so
> are
> > > also
> > > > described in the KIP).
> > > >
> > > > For XA support, I wonder if we could take a layered approach and
> store
> > XA
> > > > information in a separate store, say in a compacted topic.  This way,
> > the
> > > > core Kafka protocol could be decoupled from specific implementations
> > (and
> > > > extra performance requirements that a specific implementation may
> > impose)
> > > > and serve as a foundation for multiple implementations.
> > > >
> > > > -Artem
> > > >
> > > > On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith <rowl...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Artem,
> > > > >
> > > > > It has been a while, but I have gotten back to this. I understand
> > that
> > > > when
> > > > > 2PC is used, the transaction timeout will be effectively infinite.
> I
> > > > don't
> > > > > think that this behavior is desirable. A long running transaction
> can
> > > be
> > > > > extremely disruptive since it blocks consumers on any partitions
> > > written
> > > > to
> > > > > within the pending transaction. The primary reason for a long
> running
> > > > > transaction is a failure of the client, or the network connecting
> the
> > > > > client to the broker. If such a failure occurs before the client
> > calls
> > > > > the new prepareTransaction() method, it should be OK to abort the
> > > > > transaction after a relatively short timeout period. This approach
> > > would
> > > > > minimize the inconvenience and disruption of a long running
> > transaction
> > > > > blocking consumers, and provide higher availability for a system
> > using
> > > > > Kafka.
> > > > >
> > > > > In order to achieve this behavior, I think we would need a
> 'prepare'
> > > RPC
> > > > > call so that the server knows that a transaction has been prepared,
> > and
> > > > > does not timeout and abort such transactions. There will be some
> cost
> > > to
> > > > > this extra RPC call, but there will also be a benefit of better
> > system
> > > > > availability in case of failures.
> > > > >
> > > > > There is another reason why I would prefer this implementation. I
> am
> > > > > working on an XA KIP, and XA requires that Kafka brokers be able to
> > > > provide
> > > > > a list of prepared transactions during recovery.  The broker can
> only
> > > > know
> > > > > that a transaction has been prepared if an RPC call is made., so my
> > KIP
> > > > > will need this functionality. In the XA KIP, I would like to use as
> > > much
> > > > of
> > > > > the KIP-939 solution as possible, so it would be helpful if
> > > > > prepareTransactions() sent a 'prepare' RPC, and the broker recorded
> > the
> > > > > prepared transaction state.
> > > > >
> > > > > This could be made configurable behavior if we are concerned that
> the
> > > > cost
> > > > > of the extra RPC call is too much, and that some users would prefer
> > to
> > > > have
> > > > > speed in exchange for less system availability in some cases of
> > client
> > > or
> > > > > network failure.
> > > > >
> > > > > Let me know what you think.
> > > > >
> > > > > -Rowland
> > > > >
> > > > > On Fri, Jan 5, 2024 at 8:03 PM Artem Livshits
> > > > > <alivsh...@confluent.io.invalid> wrote:
> > > > >
> > > > > > Hi Rowland,
> > > > > >
> > > > > > Thank you for the feedback.  For the 2PC cases, the expectation
> is
> > > that
> > > > > the
> > > > > > timeout on the client would be set to "effectively infinite",
> that
> > > > would
> > > > > > exceed all practical 2PC delays.  Now I think that this
> flexibility
> > > is
> > > > > > confusing and can be misused, I have updated the KIP to just say
> > that
> > > > if
> > > > > > 2PC is used, the transaction never expires.
> > > > > >
> > > > > > -Artem
> > > > > >
> > > > > > On Thu, Jan 4, 2024 at 6:14 PM Rowland Smith <rowl...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > It is probably me. I copied the original message subject into a
> > new
> > > > > > email.
> > > > > > > Perhaps that is not enough to link them.
> > > > > > >
> > > > > > > It was not my understanding from reading KIP-939 that we are
> > doing
> > > > away
> > > > > > > with any transactional timeout in the Kafka broker. As I
> > understand
> > > > it,
> > > > > > we
> > > > > > > are allowing the application to set the transaction timeout to
> a
> > > > value
> > > > > > that
> > > > > > > exceeds the *transaction.max.timeout.ms
> > > > > > > <http://transaction.max.timeout.ms>* setting
> > > > > > > on the broker, and having no timeout if the application does
> not
> > > set
> > > > > > > *transaction.timeout.ms
> > > > > > > <http://transaction.timeout.ms>* on the producer. The KIP says
> > > that
> > > > > the
> > > > > > > semantics of *transaction.timeout.ms <
> > > http://transaction.timeout.ms
> > > > >*
> > > > > > are
> > > > > > > not being changed, so I take that to mean that the broker will
> > > > continue
> > > > > > to
> > > > > > > enforce a timeout if provided, and abort transactions that
> exceed
> > > it.
> > > > > > From
> > > > > > > the KIP:
> > > > > > >
> > > > > > > Client Configuration Changes
> > > > > > >
> > > > > > > *transaction.two.phase.commit.enable* The default would be
> > ‘false’.
> > > > If
> > > > > > set
> > > > > > > to ‘true’, then the broker is informed that the client is
> > > > participating
> > > > > > in
> > > > > > > two phase commit protocol and can set transaction timeout to
> > values
> > > > > that
> > > > > > > exceed *transaction.max.timeout.ms <
> > > > http://transaction.max.timeout.ms
> > > > > >*
> > > > > > > setting
> > > > > > > on the broker (if the timeout is not set explicitly on the
> client
> > > and
> > > > > the
> > > > > > > two phase commit is set to ‘true’ then the transaction never
> > > > expires).
> > > > > > >
> > > > > > > *transaction.timeout.ms <http://transaction.timeout.ms>* The
> > > > semantics
> > > > > > is
> > > > > > > not changed, but it can be set to values that exceed
> > > > > > > *transaction.max.timeout.ms
> > > > > > > <http://transaction.max.timeout.ms>* if
> two.phase.commit.enable
> > is
> > > > set
> > > > > > to
> > > > > > > ‘true’.
> > > > > > >
> > > > > > >
> > > > > > > Thinking about this more I believe we would also have a
> possible
> > > race
> > > > > > > condition if the broker is unaware that a transaction has been
> > > > > prepared.
> > > > > > > The application might call prepare and get a positive response,
> > but
> > > > the
> > > > > > > broker might have already aborted the transaction for exceeding
> > the
> > > > > > > timeout. It is a general rule of 2PC that once a transaction
> has
> > > been
> > > > > > > prepared it must be possible for it to be committed or aborted.
> > It
> > > > > seems
> > > > > > in
> > > > > > > this case a prepared transaction might already be aborted by
> the
> > > > > broker,
> > > > > > so
> > > > > > > it would be impossible to commit.
> > > > > > >
> > > > > > > I hope this is making sense and I am not misunderstanding the
> > KIP.
> > > > > Please
> > > > > > > let me know if I am.
> > > > > > >
> > > > > > > - Rowland
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jan 4, 2024 at 12:56 PM Justine Olshan
> > > > > > > <jols...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Rowland,
> > > > > > > >
> > > > > > > > Not sure why this message showed up in a different thread
> from
> > > the
> > > > > > other
> > > > > > > > KIP-939 discussion (is it just me?)
> > > > > > > >
> > > > > > > > In KIP-939, we do away with having any transactional timeout
> on
> > > the
> > > > > > Kafka
> > > > > > > > side. The external coordinator is fully responsible for
> > > controlling
> > > > > > > whether
> > > > > > > > the transaction completes.
> > > > > > > >
> > > > > > > > While I think there is some use in having a prepare stage, I
> > just
> > > > > > wanted
> > > > > > > to
> > > > > > > > clarify what the current KIP is proposing.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Wed, Jan 3, 2024 at 7:49 PM Rowland Smith <
> > rowl...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Artem,
> > > > > > > > >
> > > > > > > > > I saw your response in the thread I started discussing
> Kafka
> > > > > > > distributed
> > > > > > > > > transaction support and the XA interface. I would like to
> > work
> > > > with
> > > > > > you
> > > > > > > > to
> > > > > > > > > add XA support to Kafka on top of the excellent
> foundational
> > > work
> > > > > > that
> > > > > > > > you
> > > > > > > > > have started with KIP-939. I agree that explicit XA support
> > > > should
> > > > > > not
> > > > > > > be
> > > > > > > > > included in the Kafka codebase as long as the right set of
> > > basic
> > > > > > > > operations
> > > > > > > > > are provided. I will begin pulling together a KIP to follow
> > > > > KIP-939.
> > > > > > > > >
> > > > > > > > > I did have one comment on KIP-939 itself. I see that you
> > > > considered
> > > > > > an
> > > > > > > > > explicit "prepare" RPC, but decided not to add it. If I
> > > > understand
> > > > > > your
> > > > > > > > > design correctly, that would mean that a 2PC transaction
> > would
> > > > > have a
> > > > > > > > > single timeout that would need to be long enough to ensure
> > that
> > > > > > > prepared
> > > > > > > > > transactions are not aborted when an external coordinator
> > > fails.
> > > > > > > However,
> > > > > > > > > this also means that an unprepared transaction would not be
> > > > aborted
> > > > > > > > without
> > > > > > > > > waiting for the same timeout. Since long running
> transactions
> > > > block
> > > > > > > > > transactional consumers, having a long timeout for all
> > > > transactions
> > > > > > > could
> > > > > > > > > be disruptive. An explicit "prepare " RPC would allow the
> > > server
> > > > to
> > > > > > > abort
> > > > > > > > > unprepared transactions after a relatively short timeout,
> and
> > > > > apply a
> > > > > > > > much
> > > > > > > > > longer timeout only to prepared transactions. The explicit
> > > > > "prepare"
> > > > > > > RPC
> > > > > > > > > would make Kafka server more resilient to client failure at
> > the
> > > > > cost
> > > > > > of
> > > > > > > > an
> > > > > > > > > extra synchronous RPC call. I think its worth reconsidering
> > > this.
> > > > > > > > >
> > > > > > > > > With an XA implementation this might become a more
> > significant
> > > > > issue
> > > > > > > > since
> > > > > > > > > the transaction coordinator has no memory of unprepared
> > > > > transactions
> > > > > > > > across
> > > > > > > > > restarts. Such transactions would need to be cleared by
> hand
> > > > > through
> > > > > > > the
> > > > > > > > > admin client even when the transaction coordinator restarts
> > > > > > > successfully.
> > > > > > > > >
> > > > > > > > > - Rowland
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Rowland E. Smith*
> > > > > P: (862) 260-4163
> > > > > M: (201) 396-3842
> > > > >
> > > >
> > >
> >
>
>
> --
> *Rowland E. Smith*
> P: (862) 260-4163
> M: (201) 396-3842
>

Re: [DISCUSS] KIP-939: Support Participation in 2PC

Reply via email to