[
https://issues.apache.org/jira/browse/FLINK-11654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374266#comment-17374266
]
Wenhao Ji commented on FLINK-11654:
-----------------------------------
Hi everyone. I would like to share [the link to the
vote|https://lists.apache.org/thread.html/r5c69f2f8467637290b3607fdbb8e7e2b59be54705e3d22ec5d123683%40%3Cdev.flink.apache.org%3E]
of
[FLIP-172|https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer]
here, which aims to support the custom transactional.id prefix to solve this
issue. Hope you guys participate in the vote and the
[discussion|https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E]!
> Multiple transactional KafkaProducers writing to same cluster have clashing
> transaction IDs
> -------------------------------------------------------------------------------------------
>
> Key: FLINK-11654
> URL: https://issues.apache.org/jira/browse/FLINK-11654
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Kafka
> Affects Versions: 1.7.1
> Reporter: Jürgen Kreileder
> Priority: Major
> Labels: auto-unassigned, stale-major
>
> We run multiple jobs on a cluster which write a lot to the same Kafka topic
> from identically named sinks. When EXACTLY_ONCE semantic is enabled for the
> KafkaProducers we run into a lot of ProducerFencedExceptions and all jobs go
> into a restart cycle.
> Example exception from the Kafka log:
>
> {code:java}
> [2019-02-18 18:05:28,485] ERROR [ReplicaManager broker=1] Error processing
> append operation on partition finding-commands-dev-1-0
> (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is
> no longer valid. There is probably another producer with a newer epoch. 483
> (request epoch), 484 (server epoch)
> {code}
> The reason for this is the way FlinkKafkaProducer initializes the
> TransactionalIdsGenerator:
> The IDs are only guaranteed to be unique for a single Job. But they can clash
> between different Jobs (and Clusters).
>
>
> {code:java}
> ---
> a/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java
> +++
> b/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer.java
> @@ -819,6 +819,7 @@ public class FlinkKafkaProducer<IN>
> nextTransactionalIdHintState =
> context.getOperatorStateStore().getUnionListState(
> NEXT_TRANSACTIONAL_ID_HINT_DESCRIPTOR);
> transactionalIdsGenerator = new TransactionalIdsGenerator(
> + // the prefix probably should include job id and maybe cluster id
> getRuntimeContext().getTaskName() + "-" +
> ((StreamingRuntimeContext) getRuntimeContext()).getOperatorUniqueID(),
> getRuntimeContext().getIndexOfThisSubtask(),
>
> getRuntimeContext().getNumberOfParallelSubtasks(),{code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)