zclllyybb commented on issue #3871:
URL: https://github.com/apache/doris-website/issues/3871#issuecomment-4591832116

   Thanks for reporting this. I checked the current upstream/master 
implementation and the docs page referenced in the issue.
   
   Breakwater-GitHub-Analysis-Slot: slot_21a21ab2fe17
   
   Initial conclusion: for the documented **MySQL CDC with SQL Mapping** path 
(`CREATE JOB ... ON STREAMING DO INSERT INTO ... SELECT ... FROM 
cdc_stream(...)`), MySQL `DELETE` events are not currently propagated as Doris 
deletes. The issue appears to be a real documentation/product gap rather than a 
missing option in the page.
   
   Evidence from the code path:
   
   - The SQL Mapping page uses the `cdc_stream` TVF and then a normal `INSERT 
INTO ... SELECT` into an existing Doris table.
   - In `CdcStreamTableValuedFunction`, the TVF output schema is built from the 
upstream JDBC table columns only via `jdbcClient.getColumnsFromJdbc(...)`. It 
does not expose an operation column or Doris hidden delete-sign column.
   - The CDC client deserializer does distinguish deletes: for a Debezium 
`DELETE`, it emits the `before` row with `__DORIS_DELETE_SIGN__ = 1`; for 
read/create/update, it emits `__DORIS_DELETE_SIGN__ = 0`.
   - However, the TVF streaming path returns those JSON rows through 
`/api/fetchRecordStream` to the TVF scan. Because the TVF schema only contains 
source columns, the `INSERT INTO ... SELECT` projection has no way to carry 
`__DORIS_DELETE_SIGN__` into the target table's hidden delete-sign column.
   - The separate auto-table-creation path (`CREATE JOB ... FROM MYSQL (...) TO 
DATABASE ...`) uses the CDC client's `/api/writeRecords` stream-load path. That 
path explicitly adds `hidden_columns: __DORIS_DELETE_SIGN__`, and existing 
regression coverage exercises MySQL `INSERT`/`UPDATE`/`DELETE` there.
   
   So the behavior differs by mode:
   
   - `FROM MYSQL (...) TO DATABASE ...`: delete events are supported for the 
native mirror-sync path.
   - `DO INSERT INTO ... SELECT ... FROM cdc_stream(...)`: delete events are 
not currently supported as deletes in the SQL-mapping path.
   
   Suggested next steps:
   
   1. Update the SQL Mapping docs, both English and Chinese, to explicitly 
state that this mode currently does not propagate source `DELETE` events, and 
point users who need delete synchronization to the auto-table-creation sync 
path when SQL transformation is not required.
   2. If delete support is intended for SQL Mapping, add a design/code change 
so the TVF/insert pipeline can carry CDC operation semantics into Doris. Likely 
options are exposing a system operation/delete-sign column from `cdc_stream`, 
or adding planner/sink handling that maps CDC delete events to 
`__DORIS_DELETE_SIGN__` for Unique Key targets.
   3. Add a regression test for the TVF SQL Mapping path with a MySQL `DELETE`, 
because the current TVF MySQL streaming-job test covers snapshot plus 
incremental inserts, while delete coverage exists in the non-TVF `FROM MYSQL 
... TO DATABASE` path.
   
   Additional details that would help if someone wants to reproduce the exact 
user test: Doris version, full `CREATE JOB` statement, target table DDL, MySQL 
`binlog_row_image` value, and the observed result after issuing the upstream 
`DELETE`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to