DanielLeens opened a new issue, #11016:
URL: https://github.com/apache/seatunnel/issues/11016
### Search before asking
I searched the open issues with keywords such as `same table name multiple
databases schema evolution`, `mysql cdc schema evolution multi database`, and
`schema evolution same-name tables`, and I did not find an existing issue that
covers this exact bug.
### What happened
In a MySQL CDC job that captures the same table name from multiple source
databases, schema evolution can resolve an unqualified DDL statement against
the wrong source database.
A reproducible example is:
- source database A: `shop_a.products`
- source database B: `shop_b.products`
- sink routing: `database = "${database_name}_sink"`, `table =
"${table_name}"`
- schema evolution enabled: `schema-changes.enabled = true`
When both source databases emit DDL such as:
```sql
ALTER TABLE products ADD COLUMN add_column1 VARCHAR(64), ADD COLUMN
add_column2 INT;
```
the current MySQL CDC DDL resolution may pin `products` to the first matched
database instead of resolving it against the current database from which the
DDL event was produced.
As a result, the multi-database job does not reliably preserve per-database
schema isolation for same-name tables. The expected behavior is that:
- `shop_a.products` evolves and syncs to `shop_a_sink.products`
- `shop_b.products` evolves and syncs to `shop_b_sink.products`
- each downstream table continues to receive only its own source database's
records after the DDL
### SeaTunnel Version
- affected on current `dev` before the fix in PR #11015
- verified against `3.0.0-SNAPSHOT` source state during reproduction analysis
### SeaTunnel Config
```conf
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 5000
}
source {
MySQL-CDC {
server-id = 5690-5700
username = "st_user_source"
password = "mysqlpw"
database-names = ["shop_a", "shop_b"]
table-names = ["shop_a.products", "shop_b.products"]
url = "jdbc:mysql://mysql-host:3306"
schema-changes.enabled = true
}
}
sink {
jdbc {
url = "jdbc:mysql://mysql-host:3306"
driver = "com.mysql.cj.jdbc.Driver"
user = "st_user_sink"
password = "mysqlpw"
generate_sink_sql = true
database = "${database_name}_sink"
table = "${table_name}"
primary_keys = ["id"]
multi_table_sink_replica = 2
}
}
```
### Running Command
```shell
./bin/seatunnel.sh -c
config/mysqlcdc_to_mysql_with_multi_db_same_name_schema_change.conf -m local
```
### Error Exception
There is no stable user-facing stack trace for this bug. The failure is
behavioral:
- the same parser instance can resolve `ALTER TABLE products ...` to the
wrong source database
- same-name tables from different databases can lose schema isolation during
schema evolution
- downstream tables may not receive the expected database-scoped DDL and
post-DDL records independently
### Zeta or Flink or Spark Version
- Zeta engine on current `dev`
### Java or Scala Version
- Java 8/11 compatible code path in current project matrix
### Screenshots
Not applicable.
### Are you willing to submit PR?
Yes. A candidate fix and regression coverage are available in PR #11015.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]