Sergei Morozov created FLINK-37604: -------------------------------------- Summary: Flink CDC pipeline composer doesn't assign UIDs to operators Key: FLINK-37604 URL: https://issues.apache.org/jira/browse/FLINK-37604 Project: Flink Issue Type: Improvement Components: Flink CDC Affects Versions: cdc-3.2.0 Reporter: Sergei Morozov
>From the >[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#assigning-operator-ids]: {quote}It is *highly recommended* that you specify operator IDs via the *{{uid(String)}}* method. {quote} The pipeline composer currently doesn't specify the UIDs. Steps to reproduce: # Enable debug logging on {{{}org.apache.flink.streaming.api.graph.StreamGraphHasherV2{}}}. # Start a CDC pipeline # Observe messages like the following in the log {noformat} Generated hash 'cbc357ccb763df2852fee8c4fc7d55f2' for node 'Source: Value Source-1' {id: 1, parallelism: 1, user function: } Generated hash '78be0dd8677bc2711e2a56947a5ea048' for node 'PrePartition-3' {id: 3, parallelism: 1, user function: } Generated hash '0deb1b26a3d9eb3c8f0c11f7110b2903' for node 'PostPartition-5' {id: 5, parallelism: 1, user function: org.apache.flink.cdc.runtime.partitioning.PostPartitionProcessor} Generated hash 'c63277377682ee6b89910f3fdc3b3a1e' for node 'Sink Writer: Sink-6' {id: 6, parallelism: 1, user function: } Generated hash '25f5e74a29ec629de3d4a0e9f84185e9' for node 'Sink Committer: Sink-9' {id: 9, parallelism: 1, user function: } {noformat} It means that Flink generates these UIDs based on the job graph (the input and output nodes of a given node). If the job graph changes (by adding more operators), these hashes will be regenerated, and Flink will be unable to restore the state of the affected operators. -- This message was sent by Atlassian Jira (v8.20.10#820010)