[
https://issues.apache.org/jira/browse/KAFKA-9592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079982#comment-17079982
]
Boyang Chen commented on KAFKA-9592:
------------------------------------
[~iamabug] I synced with [~guozhang] offline about the solution a little bit.
Just to clarify, the problem we are trying to solve is to make `abortTxn` a
safe call when a fatal exception is thrown from other Producer APIs already.
Educating user is important, but we should do better at avoiding the impact of
human error when the function was called in unexpected place.
Thus, we are proposing an idea to fix the problem: we should bookkeep when we
already threw a fatal exception through normal processing APIs such as
sendOffset, txnCommit, or produce, so that during `abortTxn` we would not
*throw the same exception again.* This makes sure that on application level the
fatal exception is already being populated, and user was expected to take
action around it. So during `abortTxn` we shall just make it as an no-op for a
Producer in fatal state. WDYT?
> Safely abort Producer transactions during application shutdown
> --------------------------------------------------------------
>
> Key: KAFKA-9592
> URL: https://issues.apache.org/jira/browse/KAFKA-9592
> Project: Kafka
> Issue Type: Improvement
> Components: producer
> Affects Versions: 2.5.0
> Reporter: Boyang Chen
> Assignee: Xiang Zhang
> Priority: Major
> Labels: help-wanted, needs-kip, newbie
> Fix For: 2.6.0
>
>
> Today if a transactional producer hits a fatal exception, the caller usually
> catches the exception and handle it by closing the producer, and abort the
> transaction:
>
> {code:java}
> try {
> producer.beginTxn();
> producer.send(xxx);
> producer.sendOffsets(xxx);
> producer.commit();
> } catch (ProducerFenced | UnknownPid e) {
> ...
> producer.abortTxn();
> producer.close();
> }{code}
> This is what the current API suggests user to do. Another scenario is during
> an informed shutdown, people with EOS producer would also like to end an
> ongoing transaction before closing the producer as it sounds more clean.
> The tricky scenario is that `abortTxn` is not a safe call when the producer
> is already in an error state, which means user has to do another try-catch
> with the first layer catch block, making the error handling pretty annoying.
> There are several ways to make this API robust and guide user to a safe usage:
> # Internally abort any ongoing transaction within `producer.close`, and
> comment on `abortTxn` call to warn user not to do it manually.
> # Similar to 1, but get a new `close(boolean abortTxn)` API call in case
> some users want to handle transaction state by themselves.
> # Introduce a new abort transaction API with a boolean flag indicating
> whether the producer is in error state, instead of throwing exceptions
> # Introduce a public API `isInError` on producer for user to validate before
> doing any transactional API calls
> I personally favor 1 & 2 most as it is simple and does not require any API
> change. Considering the change scope, I would still recommend a small KIP.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)