[ 
https://issues.apache.org/jira/browse/KAFKA-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080072#comment-17080072
 ] 

Guozhang Wang commented on KAFKA-9592:
--------------------------------------

I'd like to add some context about this proposal as well. Currently the 
abortTxn could be triggered by user under two slightly different scenarios: 

1. The processing encountered some errors, e.g. updating the local state gets 
an IO exception, or a bug in code like dividing by zero.
2. The transaction protocol itself determines there's some errors.

Today, most users just try to call abortTxn in a finally block for both of 
them. And I think just relying on documents and educations to enforce everyone 
to correct their code such that "different exceptions are falling into 
different catch blocks, and we only call abort in one of them etc" is not 
ideal, and in addition we may want to add more txn exceptions from producer in 
the future and each time we add some new errors we'd need to update the doc 
again.

So the proposal is to let the producer client itself trying to distinguish the 
case 1) from case 2) above: in case 1) the kept errors, if there are any, have 
not been thrown yet, and hence we should still throw it to notify the user 
that, in addition to your own reasons that you'd like to abort the txn, 
there're some errors happening between the txn messaging protocol as well. 
Whereas in case 2), the user is aborting txn just because other producer APIs 
throws some txn errors and hence it does not need to be thrown again on 
abortTxn.

Of course, if a user still tries to capture all exception and call 
produce.abortTxn in the finally block, that call may still throw, but at least 
when it throws it indeed is brining in some new information to the user to 
handle.

> Safely abort Producer transactions during application shutdown
> --------------------------------------------------------------
>
>                 Key: KAFKA-9592
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9592
>             Project: Kafka
>          Issue Type: Improvement
>          Components: producer 
>    Affects Versions: 2.5.0
>            Reporter: Boyang Chen
>            Assignee: Xiang Zhang
>            Priority: Major
>              Labels: help-wanted, needs-kip, newbie
>             Fix For: 2.6.0
>
>
> Today if a transactional producer hits a fatal exception, the caller usually 
> catches the exception and handle it by closing the producer, and abort the 
> transaction:
>  
> {code:java}
> try {
>   producer.beginTxn();
>   producer.send(xxx);
>   producer.sendOffsets(xxx);
>   producer.commit();
> } catch (ProducerFenced | UnknownPid e) {
>   ...
>   producer.abortTxn();
>   producer.close();
> }{code}
> This is what the current API suggests user to do. Another scenario is during 
> an informed shutdown, people with EOS producer would also like to end an 
> ongoing transaction before closing the producer as it sounds more clean.
> The tricky scenario is that `abortTxn` is not a safe call when the producer 
> is already in an error state, which means user has to do another try-catch 
> with the first layer catch block, making the error handling pretty annoying. 
> There are several ways to make this API robust and guide user to a safe usage:
>  # Internally abort any ongoing transaction within `producer.close`, and 
> comment on `abortTxn` call to warn user not to do it manually. 
>  # Similar to 1, but get a new `close(boolean abortTxn)` API call in case 
> some users want to handle transaction state by themselves.
>  # Introduce a new abort transaction API with a boolean flag indicating 
> whether the producer is in error state, instead of throwing exceptions
>  # Introduce a public API `isInError` on producer for user to validate before 
> doing any transactional API calls
> I personally favor 1 & 2 most as it is simple and does not require any API 
> change. Considering the change scope, I would still recommend a small KIP.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to