[ 
https://issues.apache.org/jira/browse/FLINK-39759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu updated FLINK-39759:
-------------------------------
    Affects Version/s: cdc-3.5.0

> Fix StarRocks CHAR / VARCHAR mapping rules to contain utf8mb4 characters
> ------------------------------------------------------------------------
>
>                 Key: FLINK-39759
>                 URL: https://issues.apache.org/jira/browse/FLINK-39759
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.5.0
>            Reporter: Mao Jiayi
>            Assignee: Mao Jiayi
>            Priority: Major
>              Labels: pull-request-available
>
> When using the StarRocks YAML connector to sync tables whose upstream 
> {{CHAR}} or {{VARCHAR}} columns may contain utf8mb4 characters, the job may 
> create undersized StarRocks columns and fail when writing data.
> The issue happens in {*}{{StarRocksUtils.toStarRocksDataType()}}{*}. The 
> current mapping logic multiplies the upstream character length by {{3}} when 
> converting {{CHAR}} and {{VARCHAR}} types to StarRocks column definitions. 
> This assumes that each character takes at most 3 bytes in UTF-8. However, 
> utf8mb4 characters can require up to 4 bytes, so the inferred StarRocks 
> column length may be smaller than required.
> This issue is not exposed for sources whose character set only uses up to 3 
> bytes per character, because the current mapping remains sufficient in those 
> cases. It only manifests when the upstream source uses utf8mb4-like encodings 
> and the data contains 4-byte Unicode characters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to