Sergei Morozov created FLINK-37604:
--------------------------------------

             Summary: Flink CDC pipeline composer doesn't assign UIDs to 
operators
                 Key: FLINK-37604
                 URL: https://issues.apache.org/jira/browse/FLINK-37604
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
    Affects Versions: cdc-3.2.0
            Reporter: Sergei Morozov


>From the 
>[documentation|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#assigning-operator-ids]:
{quote}It is *highly recommended* that you specify operator IDs via the 
*{{uid(String)}}* method.
{quote}
The pipeline composer currently doesn't specify the UIDs.

Steps to reproduce:
 # Enable debug logging on 
{{{}org.apache.flink.streaming.api.graph.StreamGraphHasherV2{}}}.
 # Start a CDC pipeline
 # Observe messages like the following  in the log

{noformat}
Generated hash 'cbc357ccb763df2852fee8c4fc7d55f2' for node 'Source: Value 
Source-1' {id: 1, parallelism: 1, user function: }
Generated hash '78be0dd8677bc2711e2a56947a5ea048' for node 'PrePartition-3' 
{id: 3, parallelism: 1, user function: }
Generated hash '0deb1b26a3d9eb3c8f0c11f7110b2903' for node 'PostPartition-5' 
{id: 5, parallelism: 1, user function: 
org.apache.flink.cdc.runtime.partitioning.PostPartitionProcessor}
Generated hash 'c63277377682ee6b89910f3fdc3b3a1e' for node 'Sink Writer: 
Sink-6' {id: 6, parallelism: 1, user function: }
Generated hash '25f5e74a29ec629de3d4a0e9f84185e9' for node 'Sink Committer: 
Sink-9' {id: 9, parallelism: 1, user function: }
{noformat}

It means that Flink generates these UIDs based on the job graph (the input and 
output nodes of a given node). If the job graph changes (by adding more 
operators), these hashes will be regenerated, and Flink will be unable to 
restore the state of the affected operators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to