zclllyybb commented on issue #3871: URL: https://github.com/apache/doris-website/issues/3871#issuecomment-4591832116
Thanks for reporting this. I checked the current upstream/master implementation and the docs page referenced in the issue. Breakwater-GitHub-Analysis-Slot: slot_21a21ab2fe17 Initial conclusion: for the documented **MySQL CDC with SQL Mapping** path (`CREATE JOB ... ON STREAMING DO INSERT INTO ... SELECT ... FROM cdc_stream(...)`), MySQL `DELETE` events are not currently propagated as Doris deletes. The issue appears to be a real documentation/product gap rather than a missing option in the page. Evidence from the code path: - The SQL Mapping page uses the `cdc_stream` TVF and then a normal `INSERT INTO ... SELECT` into an existing Doris table. - In `CdcStreamTableValuedFunction`, the TVF output schema is built from the upstream JDBC table columns only via `jdbcClient.getColumnsFromJdbc(...)`. It does not expose an operation column or Doris hidden delete-sign column. - The CDC client deserializer does distinguish deletes: for a Debezium `DELETE`, it emits the `before` row with `__DORIS_DELETE_SIGN__ = 1`; for read/create/update, it emits `__DORIS_DELETE_SIGN__ = 0`. - However, the TVF streaming path returns those JSON rows through `/api/fetchRecordStream` to the TVF scan. Because the TVF schema only contains source columns, the `INSERT INTO ... SELECT` projection has no way to carry `__DORIS_DELETE_SIGN__` into the target table's hidden delete-sign column. - The separate auto-table-creation path (`CREATE JOB ... FROM MYSQL (...) TO DATABASE ...`) uses the CDC client's `/api/writeRecords` stream-load path. That path explicitly adds `hidden_columns: __DORIS_DELETE_SIGN__`, and existing regression coverage exercises MySQL `INSERT`/`UPDATE`/`DELETE` there. So the behavior differs by mode: - `FROM MYSQL (...) TO DATABASE ...`: delete events are supported for the native mirror-sync path. - `DO INSERT INTO ... SELECT ... FROM cdc_stream(...)`: delete events are not currently supported as deletes in the SQL-mapping path. Suggested next steps: 1. Update the SQL Mapping docs, both English and Chinese, to explicitly state that this mode currently does not propagate source `DELETE` events, and point users who need delete synchronization to the auto-table-creation sync path when SQL transformation is not required. 2. If delete support is intended for SQL Mapping, add a design/code change so the TVF/insert pipeline can carry CDC operation semantics into Doris. Likely options are exposing a system operation/delete-sign column from `cdc_stream`, or adding planner/sink handling that maps CDC delete events to `__DORIS_DELETE_SIGN__` for Unique Key targets. 3. Add a regression test for the TVF SQL Mapping path with a MySQL `DELETE`, because the current TVF MySQL streaming-job test covers snapshot plus incremental inserts, while delete coverage exists in the non-TVF `FROM MYSQL ... TO DATABASE` path. Additional details that would help if someone wants to reproduce the exact user test: Doris version, full `CREATE JOB` statement, target table DDL, MySQL `binlog_row_image` value, and the observed result after issuing the upstream `DELETE`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
