zhxiaofan opened a new issue, #159:
URL: https://github.com/apache/doris-flink-connector/issues/159

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Version
   
   Doris 1.2.5
   Mysql: 5.7+
   
   ### What's Wrong?
   
   While mysql charset is utf8mb4, it suppoerts emoj, one emoj is encoded as 
four character,but program only deal with Chinese  content situation in which 
one Chinese word  is encoded as three character.
   
   Error msg
   ````
   Caused by: org.apache.doris.flink.exception.DorisRuntimeException: stream 
load error: [INTERNAL_ERROR]too many filtered rows, see more in 
http://127.0.0.1:8040/api/_load_error_log?file=__shard_0/error_log_insert_stmt_c54018ca99a31941-7cfcd16524d4948b_c54018ca99a31941_7cfcd16524d4948b
        at 
org.apache.doris.flink.sink.writer.DorisWriter.prepareCommit(DorisWriter.java:158)
        at 
org.apache.flink.streaming.api.transformations.SinkV1Adapter$SinkWriterV1Adapter.prepareCommit(SinkV1Adapter.java:151)
        at 
org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.emitCommittables(SinkWriterOperator.java:196)
        at 
org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.prepareSnapshotPreBarrier(SinkWriterOperator.java:166)
        at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.prepareSnapshotPreBarrier(RegularOperatorChain.java:89)
        at 
org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:300)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$12(StreamTask.java:1253)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1241)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointAsyncInMailbox(StreamTask.java:1126)
        ... 13 more
   ````
   
   url above content
   ````
   curl 
http://127.0.0.1:8040/api/_load_error_log?file=__shard_0/error_log_insert_stmt_c54018ca99a31941-7cfcd16524d4948b_c54018ca99a31941_7cfcd16524d4948b
 
   
   Reason: column_name[str], the length of input is too long than schema. first 
32 bytes of input str: [😊] schema length: 3; actual length: 4; . src line [];
   ````
   
   ### What You Expected?
   
   varchar content may contains emoj ,so the table auto created by program 
should extend the column length to 4 times origin column length ragher than 3
   
   ### How to Reproduce?
   
   Mysql table create ddl:
   ````
   create table emoj_str (id int(11) primary key , str varchar(1))
   ````
   Error data:
   ````
   insert into emoj_str value (1,'😊');
   ````
   
   Running command line:
   ````
   flink116 run \
   -t yarn-per-job \
   -Dyarn.application.name=mysql-sync-database \
   -Dexecution.checkpointing.interval=10s \
   -Dparallelism.default=1 \
   -c org.apache.doris.flink.tools.cdc.CdcTools \
   /data/****/flink-doris-connector-1.16-1.4.0.jar \
   mysql-sync-database
   --database "test_emoji"
   --mysql-conf hostname=127.0.0.1
   --mysql-conf username="user"
   --mysql-conf password="password"
   --mysql-conf database-name="test_emoji"
   --mysql-conf port=1111
   --mysql-conf scan.startup.mode="initial"
   --including-tables "emoj_str"
   --sink-conf fenodes=127.0.0.1:8030
   --sink-conf username=user
   --sink-conf password=password
   --sink-conf jdbc-url=jdbc:mysql://127.0.0.1:9030
   --table-conf replication_num=3
   --job-name test
   ````
   
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to