Hi everyone, I'd like to start a discussion on FLIP-572 [1]. When a Flink job using exactly-once KafkaSink fails and does not recover, Kafka transactions can remain in the ONGOING state, blocking all downstream read_committed consumers at the Last Stable Offset until the broker timeout expires. There is currently no built-in tooling to resolve this, Kafka's own kafka-transactions.sh cannot commit Flink transactions since that requires Flink-specific internals. This FLIP proposes a standalone CLI tool, that allows operators to abort or commit lingering transactions without a running Flink cluster. Looking forward to your feedback.
Kind regards, Aleksandr Savonin [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-572%3A+Introduce+Flink-Kafka+Transactions+Management+Tool
