Hi Rowland, > I am not sure what you mean by guarantee,
A guarantee would be an elimination of complexity or a condition. E.g. if adding an explicit prepare RPC eliminated in-doubt transactions, or eliminated a significant complexity in implementation. > 1. Transactions that haven’t reached “prepared” state can be aborted via timeout. The argument is that it doesn't eliminate any conditions, it merely reduces a subset of circumstances for the conditions to happen, but the conditions still happen and must be handled. The operator still needs to set up monitoring for run-away transactions, there still needs to be an "out-of-band" channel to resolve run-away transactions (i.e. the operation would need a way that's not a part of the 2PC protocol to reconcile with the application owner), there still needs to be tooling for resolving run-away transactions. On the downside, an explicit prepare RPC would have a performance hit on the happy path in every single transaction. -Artem On Tue, Feb 6, 2024 at 7:35 PM Rowland Smith <rowl...@gmail.com> wrote: > Hi Artem, > > I am not sure what you mean by guarantee, but I am referring to a better > operational experience. You mentioned this as the first benefit of an > explicit "prepare" RPC in the KIP. > > > 1. Transactions that haven’t reached “prepared” state can be aborted via > timeout. > > However, in explaining why an explicit "prepare" RPC was not included in > the design, you make no further mention of this benefit. So what I am > saying is this benefit is quite significant operationally. Many client > application failures may occur before the transaction reaches the prepared > state, and the ability to automatically abort those transactions and > unblock affected partitions without administrative intervention or fast > restart of the client would be a worthwhile benefit. An explicit "prepare" > RPC will also be needed by the XA implementation, so I would like to see it > implemented for that reason. Otherwise, I will need to add this work to my > KIP. > > - Rowland > > On Mon, Feb 5, 2024 at 9:35 PM Artem Livshits > <alivsh...@confluent.io.invalid> wrote: > > > Hi Rowland, > > > > Thank you for your reply. I think I understand what you're saying and > just > > tried to provide a quick summary. The > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC > > actually goes into the details on what would be the benefits of adding an > > explicit prepare RPC and why those won't really add any advantages such > as > > elimination the needs for monitoring, tooling or providing additional > > guarantees. Let me know if you think of a guarantee that prepare RPC > would > > provide. > > > > -Artem > > > > On Mon, Feb 5, 2024 at 6:22 PM Rowland Smith <rowl...@gmail.com> wrote: > > > > > Hi Artem, > > > > > > I don't think that you understand what I am saying. In any transaction, > > > there is work done before the call to prepareTranscation() and work > done > > > afterwards. Any work performed before the call to prepareTransaction() > > can > > > be aborted after a relatively short timeout if the client fails. It is > > only > > > after the prepareTransaction() call that a transaction becomes in-doubt > > and > > > must be remembered for a much longer period of time to allow the client > > to > > > recover and make the decision to either commit or abort. A considerable > > > amount of time might be spent before prepareTransaction() is called, > and > > if > > > the client fails in this period, relatively quick transaction abort > would > > > unblock any partitions and make the system fully available. So a > prepare > > > RPC would reduce the window where a client failure results in > potentially > > > long-lived blocking transactions. > > > > > > Here is the proposed sequence from the KIP with 2 added steps (4 and > 5): > > > > > > > > > 1. Begin database transaction > > > 2. Begin Kafka transaction > > > 3. Produce data to Kafka > > > 4. Make updates to the database > > > 5. Repeat steps 3 and 4 as many times as necessary based on > > application > > > needs. > > > 6. Prepare Kafka transaction [currently implicit operation, > expressed > > as > > > flush] > > > 7. Write produced data to the database > > > 8. Write offsets of produced data to the database > > > 9. Commit database transaction > > > 10. Commit Kafka transaction > > > > > > > > > If the client application crashes before step 6, it is safe to abort > the > > > Kafka transaction after a relatively short timeout. > > > > > > I fully agree with a layered approach. However, the XA layer is going > to > > > require certain capabilities from the layer below it, and one of those > > > capabilities is to be able to identify and report prepared transactions > > > during recovery. > > > > > > - Rowland > > > > > > On Mon, Feb 5, 2024 at 12:46 AM Artem Livshits > > > <alivsh...@confluent.io.invalid> wrote: > > > > > > > Hi Rowland, > > > > > > > > Thank you for your feedback. Using an explicit prepare RPC was > > discussed > > > > and is listed in the rejected alternatives: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC#KIP939:SupportParticipationin2PC-Explicit%E2%80%9Cprepare%E2%80%9DRPC > > > > . > > > > Basically, even if we had an explicit prepare RPC, it doesn't avoid > the > > > > fact that a crashed client could cause a blocking transaction. This > > is, > > > > btw, is not just a specific property of this concrete proposal, it's > a > > > > fundamental trade off of any form of 2PC -- any 2PC implementation > must > > > > allow for infinitely "in-doubt" transactions that may not be > > unilaterally > > > > automatically resolved within one participant. > > > > > > > > To mitigate the issue, using 2PC requires a special permission, so > that > > > the > > > > Kafka admin could control that only applications that follow proper > > > > standards in terms of availability (i.e. will automatically restart > and > > > > cleanup after a crash) would be allowed to utilize 2PC. It is also > > > assumed > > > > that any practical deployment utilizing 2PC would have monitoring set > > up, > > > > so that an operator could be alerted to investigate and manually > > resolve > > > > in-doubt transactions (the metric and tooling support for doing so > are > > > also > > > > described in the KIP). > > > > > > > > For XA support, I wonder if we could take a layered approach and > store > > XA > > > > information in a separate store, say in a compacted topic. This way, > > the > > > > core Kafka protocol could be decoupled from specific implementations > > (and > > > > extra performance requirements that a specific implementation may > > impose) > > > > and serve as a foundation for multiple implementations. > > > > > > > > -Artem > > > > > > > > On Sun, Feb 4, 2024 at 1:37 PM Rowland Smith <rowl...@gmail.com> > > wrote: > > > > > > > > > Hi Artem, > > > > > > > > > > It has been a while, but I have gotten back to this. I understand > > that > > > > when > > > > > 2PC is used, the transaction timeout will be effectively infinite. > I > > > > don't > > > > > think that this behavior is desirable. A long running transaction > can > > > be > > > > > extremely disruptive since it blocks consumers on any partitions > > > written > > > > to > > > > > within the pending transaction. The primary reason for a long > running > > > > > transaction is a failure of the client, or the network connecting > the > > > > > client to the broker. If such a failure occurs before the client > > calls > > > > > the new prepareTransaction() method, it should be OK to abort the > > > > > transaction after a relatively short timeout period. This approach > > > would > > > > > minimize the inconvenience and disruption of a long running > > transaction > > > > > blocking consumers, and provide higher availability for a system > > using > > > > > Kafka. > > > > > > > > > > In order to achieve this behavior, I think we would need a > 'prepare' > > > RPC > > > > > call so that the server knows that a transaction has been prepared, > > and > > > > > does not timeout and abort such transactions. There will be some > cost > > > to > > > > > this extra RPC call, but there will also be a benefit of better > > system > > > > > availability in case of failures. > > > > > > > > > > There is another reason why I would prefer this implementation. I > am > > > > > working on an XA KIP, and XA requires that Kafka brokers be able to > > > > provide > > > > > a list of prepared transactions during recovery. The broker can > only > > > > know > > > > > that a transaction has been prepared if an RPC call is made., so my > > KIP > > > > > will need this functionality. In the XA KIP, I would like to use as > > > much > > > > of > > > > > the KIP-939 solution as possible, so it would be helpful if > > > > > prepareTransactions() sent a 'prepare' RPC, and the broker recorded > > the > > > > > prepared transaction state. > > > > > > > > > > This could be made configurable behavior if we are concerned that > the > > > > cost > > > > > of the extra RPC call is too much, and that some users would prefer > > to > > > > have > > > > > speed in exchange for less system availability in some cases of > > client > > > or > > > > > network failure. > > > > > > > > > > Let me know what you think. > > > > > > > > > > -Rowland > > > > > > > > > > On Fri, Jan 5, 2024 at 8:03 PM Artem Livshits > > > > > <alivsh...@confluent.io.invalid> wrote: > > > > > > > > > > > Hi Rowland, > > > > > > > > > > > > Thank you for the feedback. For the 2PC cases, the expectation > is > > > that > > > > > the > > > > > > timeout on the client would be set to "effectively infinite", > that > > > > would > > > > > > exceed all practical 2PC delays. Now I think that this > flexibility > > > is > > > > > > confusing and can be misused, I have updated the KIP to just say > > that > > > > if > > > > > > 2PC is used, the transaction never expires. > > > > > > > > > > > > -Artem > > > > > > > > > > > > On Thu, Jan 4, 2024 at 6:14 PM Rowland Smith <rowl...@gmail.com> > > > > wrote: > > > > > > > > > > > > > It is probably me. I copied the original message subject into a > > new > > > > > > email. > > > > > > > Perhaps that is not enough to link them. > > > > > > > > > > > > > > It was not my understanding from reading KIP-939 that we are > > doing > > > > away > > > > > > > with any transactional timeout in the Kafka broker. As I > > understand > > > > it, > > > > > > we > > > > > > > are allowing the application to set the transaction timeout to > a > > > > value > > > > > > that > > > > > > > exceeds the *transaction.max.timeout.ms > > > > > > > <http://transaction.max.timeout.ms>* setting > > > > > > > on the broker, and having no timeout if the application does > not > > > set > > > > > > > *transaction.timeout.ms > > > > > > > <http://transaction.timeout.ms>* on the producer. The KIP says > > > that > > > > > the > > > > > > > semantics of *transaction.timeout.ms < > > > http://transaction.timeout.ms > > > > >* > > > > > > are > > > > > > > not being changed, so I take that to mean that the broker will > > > > continue > > > > > > to > > > > > > > enforce a timeout if provided, and abort transactions that > exceed > > > it. > > > > > > From > > > > > > > the KIP: > > > > > > > > > > > > > > Client Configuration Changes > > > > > > > > > > > > > > *transaction.two.phase.commit.enable* The default would be > > ‘false’. > > > > If > > > > > > set > > > > > > > to ‘true’, then the broker is informed that the client is > > > > participating > > > > > > in > > > > > > > two phase commit protocol and can set transaction timeout to > > values > > > > > that > > > > > > > exceed *transaction.max.timeout.ms < > > > > http://transaction.max.timeout.ms > > > > > >* > > > > > > > setting > > > > > > > on the broker (if the timeout is not set explicitly on the > client > > > and > > > > > the > > > > > > > two phase commit is set to ‘true’ then the transaction never > > > > expires). > > > > > > > > > > > > > > *transaction.timeout.ms <http://transaction.timeout.ms>* The > > > > semantics > > > > > > is > > > > > > > not changed, but it can be set to values that exceed > > > > > > > *transaction.max.timeout.ms > > > > > > > <http://transaction.max.timeout.ms>* if > two.phase.commit.enable > > is > > > > set > > > > > > to > > > > > > > ‘true’. > > > > > > > > > > > > > > > > > > > > > Thinking about this more I believe we would also have a > possible > > > race > > > > > > > condition if the broker is unaware that a transaction has been > > > > > prepared. > > > > > > > The application might call prepare and get a positive response, > > but > > > > the > > > > > > > broker might have already aborted the transaction for exceeding > > the > > > > > > > timeout. It is a general rule of 2PC that once a transaction > has > > > been > > > > > > > prepared it must be possible for it to be committed or aborted. > > It > > > > > seems > > > > > > in > > > > > > > this case a prepared transaction might already be aborted by > the > > > > > broker, > > > > > > so > > > > > > > it would be impossible to commit. > > > > > > > > > > > > > > I hope this is making sense and I am not misunderstanding the > > KIP. > > > > > Please > > > > > > > let me know if I am. > > > > > > > > > > > > > > - Rowland > > > > > > > > > > > > > > > > > > > > > On Thu, Jan 4, 2024 at 12:56 PM Justine Olshan > > > > > > > <jols...@confluent.io.invalid> > > > > > > > wrote: > > > > > > > > > > > > > > > Hey Rowland, > > > > > > > > > > > > > > > > Not sure why this message showed up in a different thread > from > > > the > > > > > > other > > > > > > > > KIP-939 discussion (is it just me?) > > > > > > > > > > > > > > > > In KIP-939, we do away with having any transactional timeout > on > > > the > > > > > > Kafka > > > > > > > > side. The external coordinator is fully responsible for > > > controlling > > > > > > > whether > > > > > > > > the transaction completes. > > > > > > > > > > > > > > > > While I think there is some use in having a prepare stage, I > > just > > > > > > wanted > > > > > > > to > > > > > > > > clarify what the current KIP is proposing. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Justine > > > > > > > > > > > > > > > > On Wed, Jan 3, 2024 at 7:49 PM Rowland Smith < > > rowl...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Artem, > > > > > > > > > > > > > > > > > > I saw your response in the thread I started discussing > Kafka > > > > > > > distributed > > > > > > > > > transaction support and the XA interface. I would like to > > work > > > > with > > > > > > you > > > > > > > > to > > > > > > > > > add XA support to Kafka on top of the excellent > foundational > > > work > > > > > > that > > > > > > > > you > > > > > > > > > have started with KIP-939. I agree that explicit XA support > > > > should > > > > > > not > > > > > > > be > > > > > > > > > included in the Kafka codebase as long as the right set of > > > basic > > > > > > > > operations > > > > > > > > > are provided. I will begin pulling together a KIP to follow > > > > > KIP-939. > > > > > > > > > > > > > > > > > > I did have one comment on KIP-939 itself. I see that you > > > > considered > > > > > > an > > > > > > > > > explicit "prepare" RPC, but decided not to add it. If I > > > > understand > > > > > > your > > > > > > > > > design correctly, that would mean that a 2PC transaction > > would > > > > > have a > > > > > > > > > single timeout that would need to be long enough to ensure > > that > > > > > > > prepared > > > > > > > > > transactions are not aborted when an external coordinator > > > fails. > > > > > > > However, > > > > > > > > > this also means that an unprepared transaction would not be > > > > aborted > > > > > > > > without > > > > > > > > > waiting for the same timeout. Since long running > transactions > > > > block > > > > > > > > > transactional consumers, having a long timeout for all > > > > transactions > > > > > > > could > > > > > > > > > be disruptive. An explicit "prepare " RPC would allow the > > > server > > > > to > > > > > > > abort > > > > > > > > > unprepared transactions after a relatively short timeout, > and > > > > > apply a > > > > > > > > much > > > > > > > > > longer timeout only to prepared transactions. The explicit > > > > > "prepare" > > > > > > > RPC > > > > > > > > > would make Kafka server more resilient to client failure at > > the > > > > > cost > > > > > > of > > > > > > > > an > > > > > > > > > extra synchronous RPC call. I think its worth reconsidering > > > this. > > > > > > > > > > > > > > > > > > With an XA implementation this might become a more > > significant > > > > > issue > > > > > > > > since > > > > > > > > > the transaction coordinator has no memory of unprepared > > > > > transactions > > > > > > > > across > > > > > > > > > restarts. Such transactions would need to be cleared by > hand > > > > > through > > > > > > > the > > > > > > > > > admin client even when the transaction coordinator restarts > > > > > > > successfully. > > > > > > > > > > > > > > > > > > - Rowland > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > *Rowland E. Smith* > > > > > P: (862) 260-4163 > > > > > M: (201) 396-3842 > > > > > > > > > > > > > > > > > -- > *Rowland E. Smith* > P: (862) 260-4163 > M: (201) 396-3842 >