zclllyybb commented on issue #3871:
URL: https://github.com/apache/doris-website/issues/3871#issuecomment-4591844426

   Breakwater-GitHub-Analysis-Slot: slot_21a21ab2fe17
   
   Thanks for reporting this. I checked the current upstream/master 
implementation and the docs page referenced in the issue.
   
   Initial conclusion: for the documented **MySQL CDC with SQL Mapping** path 
(`CREATE JOB ... ON STREAMING DO INSERT INTO ... SELECT ... FROM 
cdc_stream(...)`), the observation is valid: MySQL `DELETE` events are not 
currently propagated as Doris deletes. This looks like a current limitation/bug 
in the SQL-mapping TVF pipeline, not just a missing option in the page.
   
   Evidence from the code path:
   
   - The SQL Mapping page uses the `cdc_stream` TVF and then a normal `INSERT 
INTO ... SELECT` into an existing Doris table.
   - The CDC client deserializer does distinguish deletes: for a Debezium 
`DELETE`, it emits the `before` row with `__DORIS_DELETE_SIGN__ = 1`; for 
read/create/update, it emits `__DORIS_DELETE_SIGN__ = 0`.
   - `CdcStreamTableValuedFunction.getTableColumns()` builds the TVF output 
schema from the upstream JDBC table columns only via 
`jdbcClient.getColumnsFromJdbc(...)`. It does not expose an operation column or 
Doris hidden delete-sign column.
   - `StreamingInsertTask` runs the SQL as a normal insert plan, so there is no 
Stream Load `hidden_columns` header on this path. By contrast, the 
auto-table-creation path (`CREATE JOB ... FROM MYSQL (...) TO DATABASE ...`) 
goes through `StreamingMultiTblTask -> /api/writeRecords -> 
DorisBatchStreamLoad`, where `HttpPutBuilder.addHiddenColumns(true)` sets 
`hidden_columns: __DORIS_DELETE_SIGN__`.
   - Existing standalone TVF regression output also shows the symptom: after a 
MySQL `UPDATE C1` and `DELETE D1`, `test_cdc_stream_tvf_mysql` expects TVF 
output rows `C1 99` and `D1 4`. The delete is visible only as the original row 
once the hidden delete marker is not exposed by the TVF schema. The MySQL 
`cdc_stream` streaming-job regression currently covers snapshot plus 
incremental inserts, but not a delete case.
   
   So the behavior differs by mode:
   
   - `FROM MYSQL (...) TO DATABASE ...`: delete events are supported for the 
native mirror-sync path to primary-key/Unique Key targets.
   - `DO INSERT INTO ... SELECT ... FROM cdc_stream(...)`: delete events are 
not currently supported as deletes in the SQL-mapping path. On a Unique Key 
target this can rewrite the row with the delete event's before-image instead of 
deleting it; on a duplicate target it can append another row.
   
   Suggested next steps:
   
   1. Update the SQL Mapping docs, both English and Chinese/versioned docs, to 
explicitly state that this mode currently does not propagate source `DELETE` 
events. Point users who need delete synchronization to the auto-table-creation 
sync path when SQL transformation is not required.
   2. If delete support is intended for SQL Mapping, add a design/code change 
so the TVF/insert pipeline can carry CDC operation semantics into Doris. Likely 
options are exposing a CDC op/delete-sign column from `cdc_stream`, or adding 
planner/sink handling that maps CDC delete events to `__DORIS_DELETE_SIGN__` 
for Unique Key targets.
   3. Add a regression test for the TVF SQL Mapping streaming-job path with a 
MySQL `DELETE`, because that is the path described by this doc and it is not 
covered by the current MySQL streaming-job TVF test.
   
   Additional details that would help if someone wants to reproduce the exact 
user test: Doris version, full `CREATE JOB` statement, target table DDL, MySQL 
`binlog_row_image` value, and the observed result after issuing the upstream 
`DELETE`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to