Peter Larsen created FLINK-35990:
------------------------------------
Summary: Lingering Transactions with FlinkKafkaProducer after
failures & scale-down
Key: FLINK-35990
URL: https://issues.apache.org/jira/browse/FLINK-35990
Project: Flink
Issue Type: Bug
Components: Connectors / Kafka
Affects Versions: 1.17.2, 1.14.3
Reporter: Peter Larsen
Hi! I’ve recently hit some issues with lingering transactions not getting
aborted by FlinkKafkaProducer on 1.14.3. The failure seems to be triggered by a
failed restart from a checkpoint, then restarting with lower parallelism. I
made a test that I think reproduces the issue and pushed it up to a fork
[here|https://github.com/peterdlarsen/flink/compare/peterdlarsen:c0027e5...peterdlarsen:b4c4750].
I also reproduced on a local cluster with 1.14.3 and am happy to share more
details if that’s useful!
I’m assuming migrating to KafkaSink is the recommended remediation as opposed
to fixing, but wanted to report in case it’s helpful to anyone else.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)