Nico Kruber created FLINK-23839:
-----------------------------------
Summary: Unclear severity of Kafka transaction recommit warning in
logs
Key: FLINK-23839
URL: https://issues.apache.org/jira/browse/FLINK-23839
Project: Flink
Issue Type: Sub-task
Components: Connectors / Kafka
Affects Versions: 1.13.2, 1.12.5, 1.11.4
Reporter: Nico Kruber
In a transactional Kafka sink, after recovery, all transactions from the
recovered checkpoint are recommitted even though they may have already been
committed before because this is not part of the checkpoint.
This second commit can lead to a number of WARN entries in the logs coming from
[KafkaCommitter|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/sink/KafkaCommitter.java#L66]
or
[FlinkKafkaProducer|https://github.com/apache/flink/blob/6c9818323b41a84137c52822d2993df788dbc9bb/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java#L1057].
Examples:
{code}
WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ...
Encountered error org.apache.kafka.common.errors.InvalidTxnStateException: The
producer attempted a transactional operation in an invalid state. while
recovering transaction KafkaTransactionState [transactionalId=...,
producerId=12345, epoch=123]. Presumably this transaction has been already
committed before.
WARN [org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer] ...
Encountered error org.apache.kafka.common.errors.ProducerFencedException:
Producer attempted an operation with an old epoch. Either there is a newer
producer with the same transactionalId, or the producer's transaction has been
expired by the broker. while recovering transaction KafkaTransactionState
[transactionalId=..., producerId=12345, epoch=12345]. Presumably this
transaction has been already committed before
{code}
It sounds to me like the second exception is useful and indicates that the
transaction timeout is too short. The first exception, however, seems
superfluous and rather alerts the user more than it helps. Or what would you do
with it?
Can we instead filter out superfluous exceptions and at least put these onto
DEBUG logs instead?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)