Re: [PR] [FLINK-36964] Fix potential exception when SchemaChange in parallel w… [flink-cdc]

via GitHub Thu, 26 Dec 2024 17:33:43 -0800


yuxiqian commented on PR #3818:
URL: https://github.com/apache/flink-cdc/pull/3818#issuecomment-2563220029


   Hi @lvyanquan, I have some concern about how BucketAssignOperator works with 
schema evolution strategy.
   
   1) FlushEvent now works as a "pipeline drain indicator", which means after 
all data sink writer acknowledges them, there should't be any unhandled / 
uncommitted events flowing the whole pipeline, so SchemaRegistry could evolve 
downstream DB safely.
   
   ![Normal Flush Event 
Semantic](https://github.com/user-attachments/assets/5403835d-ec5f-48fd-9767-b9991bc2fb31)
   
   However, the bucket assigning strategy might break that assumption, where 
data sink writers might receive data change events with stale schema, even 
after external schema evolution processes have finished.
   
   ![Questionable FlushEvent 
Semantic](https://github.com/user-attachments/assets/10ec528b-e00e-44a7-8d7c-0b700c03ff7c)
   
   2) Potential schema operator hanging risks with "distributed" topology. 
   
   ![Hanging 
risks](https://github.com/user-attachments/assets/0f5486d4-ec0b-4c0f-b41a-59da57abe88d)
   
   Basically same as described in #3680. In short, if a broadcast / custom 
partitioning topology is applied, then blocking one upstream partition will 
implicitly block all downstream partitions from handling events.
   
   3) Why we need another Bucket assigner operator?
   
   AFAIK all data change events have been hashed & distributed in 
PartitionOperator. Since adding a BucketAssignOperator does not change the 
parallelism, is there any reason we can't calculate the correct bucket 
partition ID in advance, instead of creating another partitioning topology?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-36964] Fix potential exception when SchemaChange in parallel w… [flink-cdc]

Reply via email to