lizc9 opened a new issue, #4615: URL: https://github.com/apache/paimon/issues/4615
### Search before asking - [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation cdc action supports the following formats of kafka data which collected from mongodb via debezium, regardless of whether it contains schema ```json { "before": null, "after": "{\"_id\":{\"$oid\":\"64001c996f4de7ff3189d374\"},\"last_updated_at\":{\"$numberLong\":\"1732232838425\"},\"tags\":[\"completely\",\"pass\"],\"updated_by\":\"xxx\"}", "updateDescription": null, "source": { "version": "2.7.0.Final", "connector": "mongodb", "name": "datapipeline", "ts_ms": 1732644484000, "snapshot": "false", "db": "datapipeline", "sequence": null, "ts_us": 1732644484000000, "ts_ns": 1732644484000000000, "collection": "clips", "ord": 22, "lsid": null, "txnNumber": null, "wallTime": null }, "op": "c", "ts_ms": 1732644484231, "transaction": null } ``` ### Solution Add `debezium-bson` format for kafka cdc action: 1. support parse bson string from before/after field in kafka message 2. convert bson value to java object and basic data type Expected Results: - Schema: | Column | DataType | Key | |-----------------|----------|-------------| | _id | STRING | Primary Key | | last_updated_at | STRING | | | tags | STRING | | | updated_by | STRING | | - Records: | RowKind | _id | last_updated_at | tags | updated_by | |---------|--------------------------|-----------------|---------------------------|------------| | +I | 64001c996f4de7ff3189d374 | 1732232838425 | [\"completely\",\"pass\"] | xxx | ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org