brantyou opened a new issue, #64806:
URL: https://github.com/apache/doris/issues/64806

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   4.1.1
   
   ### What's Wrong?
   
   When using the multi-table CDC mode of CREATE STREAMING JOB in Doris 4.1.1 
to synchronize tables and data from a specified MySQL database with auto table 
creation enabled, all Chinese characters in the synchronized data are garbled 
and displayed as ???.
   For the identical MySQL table, if we adopt the CREATE STREAMING JOB TVF 
mode: manually create a Doris primary key model table first, then perform data 
synchronization, the Chinese characters display normally without garbling.
   We executed SHOW CREATE TABLE to check the DDL statements generated by Doris 
under the two modes, and found no obvious differences between them.
   In the CDC mode synchronization task, the following parameters have already 
been appended to the MySQL jdbc_url:
   
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
   Yet Chinese content still turns into garbled ???.
   The schema configuration of the corresponding MySQL table is as follows:
   ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 
ROW_FORMAT=DYNAMIC
   
   ### What You Expected?
   
   When synchronizing MySQL tables and data using Doris 4.1.1 multi-table CDC 
streaming job with auto table creation, Chinese text should be parsed and 
stored correctly, displaying normal Chinese characters instead of garbled ???.
   The Chinese display effect should be consistent with the TVF streaming job 
mode (manually created Doris primary key tables work fine for Chinese content).
   
   ### How to Reproduce?
   
   Environment: Apache Doris 4.1.1, source MySQL table with charset utf8mb4
   MySQL table config: ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC
   Create a multi-table CDC streaming job with auto table creation enabled to 
sync all tables from a specified MySQL database.
   Append charset & timezone parameters to MySQL jdbc_url:
   
?useUnicode=true&characterEncoding=utf-8&serverTimezone=Asia/Shanghai&zeroDateTimeBehavior=convertToNull
   Insert or read Chinese data in MySQL source table, wait for CDC 
synchronization to complete.
   Query synced data in Doris, all Chinese characters show as ???.
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to