zhxiaofan opened a new issue, #159: URL: https://github.com/apache/doris-flink-connector/issues/159
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Version Doris 1.2.5 Mysql: 5.7+ ### What's Wrong? While mysql charset is utf8mb4, it suppoerts emoj, one emoj is encoded as four character,but program only deal with Chinese content situation in which one Chinese word is encoded as three character. Error msg ```` Caused by: org.apache.doris.flink.exception.DorisRuntimeException: stream load error: [INTERNAL_ERROR]too many filtered rows, see more in http://127.0.0.1:8040/api/_load_error_log?file=__shard_0/error_log_insert_stmt_c54018ca99a31941-7cfcd16524d4948b_c54018ca99a31941_7cfcd16524d4948b at org.apache.doris.flink.sink.writer.DorisWriter.prepareCommit(DorisWriter.java:158) at org.apache.flink.streaming.api.transformations.SinkV1Adapter$SinkWriterV1Adapter.prepareCommit(SinkV1Adapter.java:151) at org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.emitCommittables(SinkWriterOperator.java:196) at org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.prepareSnapshotPreBarrier(SinkWriterOperator.java:166) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.prepareSnapshotPreBarrier(RegularOperatorChain.java:89) at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:300) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$12(StreamTask.java:1253) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1241) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1126) ... 13 more ```` url above content ```` curl http://127.0.0.1:8040/api/_load_error_log?file=__shard_0/error_log_insert_stmt_c54018ca99a31941-7cfcd16524d4948b_c54018ca99a31941_7cfcd16524d4948b Reason: column_name[str], the length of input is too long than schema. first 32 bytes of input str: [😊] schema length: 3; actual length: 4; . src line []; ```` ### What You Expected? varchar content may contains emoj ,so the table auto created by program should extend the column length to 4 times origin column length ragher than 3 ### How to Reproduce? Mysql table create ddl: ```` create table emoj_str (id int(11) primary key , str varchar(1)) ```` Error data: ```` insert into emoj_str value (1,'😊'); ```` Running command line: ```` flink116 run \ -t yarn-per-job \ -Dyarn.application.name=mysql-sync-database \ -Dexecution.checkpointing.interval=10s \ -Dparallelism.default=1 \ -c org.apache.doris.flink.tools.cdc.CdcTools \ /data/****/flink-doris-connector-1.16-1.4.0.jar \ mysql-sync-database --database "test_emoji" --mysql-conf hostname=127.0.0.1 --mysql-conf username="user" --mysql-conf password="password" --mysql-conf database-name="test_emoji" --mysql-conf port=1111 --mysql-conf scan.startup.mode="initial" --including-tables "emoj_str" --sink-conf fenodes=127.0.0.1:8030 --sink-conf username=user --sink-conf password=password --sink-conf jdbc-url=jdbc:mysql://127.0.0.1:9030 --table-conf replication_num=3 --job-name test ```` ### Anything Else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
