yuxiqian commented on PR #3801:
URL: https://github.com/apache/flink-cdc/pull/3801#issuecomment-2541132698

   Currently, a YAML pipeline job has a typical topology like this:
   
   
![current_topology](https://github.com/user-attachments/assets/006ad5f0-9910-45c0-a785-569b75644df6)
   
   It relies on a basic assumption: Data from a single table must either:
   
   * only presents and evolves in one single partition...
   * or, presents in multiple partitions, but with a globally static schema.
   
   The underlying reason is we're lacking a coordination mechanism across 
schema operators. For example, if Schema Operator 1 triggers a schema change 
event request, other schema operators will not even be aware of that, since 
operators will **only try to communicate with coordinator when it receives a 
schema change event** from upstream.
   
   It would be a problem when handling distributed sources, since each 
partition could emit a schema change stream on its own, but we must have a 
globally effective schema to write to downstream.
   
   However, simply request operators to block and align is not viable in 
current design architecture, because we have a broadcast topology right after 
schema operator, and might freeze the entire downstream from receiving events, 
leaving us no chance to flush pending data records (See #3680 for more details 
about barrier alignment).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to