zclllyybb commented on issue #3871: URL: https://github.com/apache/doris-website/issues/3871#issuecomment-4591844426
Breakwater-GitHub-Analysis-Slot: slot_21a21ab2fe17 Thanks for reporting this. I checked the current upstream/master implementation and the docs page referenced in the issue. Initial conclusion: for the documented **MySQL CDC with SQL Mapping** path (`CREATE JOB ... ON STREAMING DO INSERT INTO ... SELECT ... FROM cdc_stream(...)`), the observation is valid: MySQL `DELETE` events are not currently propagated as Doris deletes. This looks like a current limitation/bug in the SQL-mapping TVF pipeline, not just a missing option in the page. Evidence from the code path: - The SQL Mapping page uses the `cdc_stream` TVF and then a normal `INSERT INTO ... SELECT` into an existing Doris table. - The CDC client deserializer does distinguish deletes: for a Debezium `DELETE`, it emits the `before` row with `__DORIS_DELETE_SIGN__ = 1`; for read/create/update, it emits `__DORIS_DELETE_SIGN__ = 0`. - `CdcStreamTableValuedFunction.getTableColumns()` builds the TVF output schema from the upstream JDBC table columns only via `jdbcClient.getColumnsFromJdbc(...)`. It does not expose an operation column or Doris hidden delete-sign column. - `StreamingInsertTask` runs the SQL as a normal insert plan, so there is no Stream Load `hidden_columns` header on this path. By contrast, the auto-table-creation path (`CREATE JOB ... FROM MYSQL (...) TO DATABASE ...`) goes through `StreamingMultiTblTask -> /api/writeRecords -> DorisBatchStreamLoad`, where `HttpPutBuilder.addHiddenColumns(true)` sets `hidden_columns: __DORIS_DELETE_SIGN__`. - Existing standalone TVF regression output also shows the symptom: after a MySQL `UPDATE C1` and `DELETE D1`, `test_cdc_stream_tvf_mysql` expects TVF output rows `C1 99` and `D1 4`. The delete is visible only as the original row once the hidden delete marker is not exposed by the TVF schema. The MySQL `cdc_stream` streaming-job regression currently covers snapshot plus incremental inserts, but not a delete case. So the behavior differs by mode: - `FROM MYSQL (...) TO DATABASE ...`: delete events are supported for the native mirror-sync path to primary-key/Unique Key targets. - `DO INSERT INTO ... SELECT ... FROM cdc_stream(...)`: delete events are not currently supported as deletes in the SQL-mapping path. On a Unique Key target this can rewrite the row with the delete event's before-image instead of deleting it; on a duplicate target it can append another row. Suggested next steps: 1. Update the SQL Mapping docs, both English and Chinese/versioned docs, to explicitly state that this mode currently does not propagate source `DELETE` events. Point users who need delete synchronization to the auto-table-creation sync path when SQL transformation is not required. 2. If delete support is intended for SQL Mapping, add a design/code change so the TVF/insert pipeline can carry CDC operation semantics into Doris. Likely options are exposing a CDC op/delete-sign column from `cdc_stream`, or adding planner/sink handling that maps CDC delete events to `__DORIS_DELETE_SIGN__` for Unique Key targets. 3. Add a regression test for the TVF SQL Mapping streaming-job path with a MySQL `DELETE`, because that is the path described by this doc and it is not covered by the current MySQL streaming-job TVF test. Additional details that would help if someone wants to reproduce the exact user test: Doris version, full `CREATE JOB` statement, target table DDL, MySQL `binlog_row_image` value, and the observed result after issuing the upstream `DELETE`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
