DanielLeens opened a new issue, #11016:
URL: https://github.com/apache/seatunnel/issues/11016

   ### Search before asking
   
   I searched the open issues with keywords such as `same table name multiple 
databases schema evolution`, `mysql cdc schema evolution multi database`, and 
`schema evolution same-name tables`, and I did not find an existing issue that 
covers this exact bug.
   
   ### What happened
   
   In a MySQL CDC job that captures the same table name from multiple source 
databases, schema evolution can resolve an unqualified DDL statement against 
the wrong source database.
   
   A reproducible example is:
   
   - source database A: `shop_a.products`
   - source database B: `shop_b.products`
   - sink routing: `database = "${database_name}_sink"`, `table = 
"${table_name}"`
   - schema evolution enabled: `schema-changes.enabled = true`
   
   When both source databases emit DDL such as:
   
   ```sql
   ALTER TABLE products ADD COLUMN add_column1 VARCHAR(64), ADD COLUMN 
add_column2 INT;
   ```
   
   the current MySQL CDC DDL resolution may pin `products` to the first matched 
database instead of resolving it against the current database from which the 
DDL event was produced.
   
   As a result, the multi-database job does not reliably preserve per-database 
schema isolation for same-name tables. The expected behavior is that:
   
   - `shop_a.products` evolves and syncs to `shop_a_sink.products`
   - `shop_b.products` evolves and syncs to `shop_b_sink.products`
   - each downstream table continues to receive only its own source database's 
records after the DDL
   
   ### SeaTunnel Version
   
   - affected on current `dev` before the fix in PR #11015
   - verified against `3.0.0-SNAPSHOT` source state during reproduction analysis
   
   ### SeaTunnel Config
   
   ```conf
   env {
     parallelism = 1
     job.mode = "STREAMING"
     checkpoint.interval = 5000
   }
   
   source {
     MySQL-CDC {
       server-id = 5690-5700
       username = "st_user_source"
       password = "mysqlpw"
       database-names = ["shop_a", "shop_b"]
       table-names = ["shop_a.products", "shop_b.products"]
       url = "jdbc:mysql://mysql-host:3306"
       schema-changes.enabled = true
     }
   }
   
   sink {
     jdbc {
       url = "jdbc:mysql://mysql-host:3306"
       driver = "com.mysql.cj.jdbc.Driver"
       user = "st_user_sink"
       password = "mysqlpw"
       generate_sink_sql = true
       database = "${database_name}_sink"
       table = "${table_name}"
       primary_keys = ["id"]
       multi_table_sink_replica = 2
     }
   }
   ```
   
   ### Running Command
   
   ```shell
   ./bin/seatunnel.sh -c 
config/mysqlcdc_to_mysql_with_multi_db_same_name_schema_change.conf -m local
   ```
   
   ### Error Exception
   
   There is no stable user-facing stack trace for this bug. The failure is 
behavioral:
   
   - the same parser instance can resolve `ALTER TABLE products ...` to the 
wrong source database
   - same-name tables from different databases can lose schema isolation during 
schema evolution
   - downstream tables may not receive the expected database-scoped DDL and 
post-DDL records independently
   
   ### Zeta or Flink or Spark Version
   
   - Zeta engine on current `dev`
   
   ### Java or Scala Version
   
   - Java 8/11 compatible code path in current project matrix
   
   ### Screenshots
   
   Not applicable.
   
   ### Are you willing to submit PR?
   
   Yes. A candidate fix and regression coverage are available in PR #11015.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to