Yordan Pavlov created FLINK-31304:
-------------------------------------
Summary: Very slow job start if topic has been used before
Key: FLINK-31304
URL: https://issues.apache.org/jira/browse/FLINK-31304
Project: Flink
Issue Type: Improvement
Components: Connectors / Kafka
Affects Versions: 1.15.2
Reporter: Yordan Pavlov
We have the following use case. We use KafkaSink with Exactly once semantic,
from time to time we would re-start the job clean, in doing so we delete and
re-create the output topic and also any Flink checkpoints. In such situation it
would take close to an hour for Flink to start. In the the time the job is
idling we would see the following log in the Taskmanager:
{code:java}
2023-03-02 16:33:42.004 [Source: Kafka source blocks -> Deduplicate blocks ->
Map -> Parse blocks -> Map -> Kafka sink volume: Writer -> Kafka sink volume:
Committer (2/5)#0] INFO
o.apache.kafka.clients.producer.internals.TransactionManager - [Producer
clientId=producer-state.clickhouse-0-1-1,
transactionalId=state.clickhouse-0-1-1] Invoking InitProducerId for the first
time in order to acquire a producer ID
2023-03-02 16:33:42.005 [kafka-producer-network-thread |
producer-state.clickhouse-0-2-1] INFO
o.apache.kafka.clients.producer.internals.TransactionManager - [Producer
clientId=producer-state.clickhouse-0-2-1,
transactionalId=state.clickhouse-0-2-1] ProducerId set to 31719488 with epoch
8{code}
If we use a brand new output topic name, the job would start straight away.
Could you advise if this can be improved?
Such logs would go on and on in what seems forever.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)