[
https://issues.apache.org/jira/browse/FLINK-39759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mao Jiayi updated FLINK-39759:
------------------------------
Description:
When using the StarRocks YAML connector to sync tables whose upstream {{CHAR}}
or {{VARCHAR}} columns may contain utf8mb4 characters, the job may create
undersized StarRocks columns and fail when writing data.
The issue happens in {*}{{StarRocksUtils.toStarRocksDataType()}}{*}. The
current mapping logic multiplies the upstream character length by {{3}} when
converting {{CHAR}} and {{VARCHAR}} types to StarRocks column definitions. This
assumes that each character takes at most 3 bytes in UTF-8. However, utf8mb4
characters can require up to 4 bytes, so the inferred StarRocks column length
may be smaller than required.
This issue is not exposed for sources whose character set only uses up to 3
bytes per character, because the current mapping remains sufficient in those
cases. It only manifests when the upstream source uses utf8mb4-like encodings
and the data contains 4-byte Unicode characters.
was:
When using the MongoDB YAML connector to read data containing
{{BsonDecimal128}} values, the job may fail if the decimal value has leading
zeros after the decimal point (e.g., {{{}0.0001234{}}}).
The failure happens in {*}CdcTypeConverter.toCdcType(){*}. Java's
{{BigDecimal}} treats leading zeros in the fractional part as insignificant
digits, resulting in a {{precision}} that is smaller than the {{{}scale{}}}.
For example, {{0.0001234}} yields {{precision = 4}} and {{{}scale = 7{}}}.
Flink CDC's *DECIMAL(precision, scale)* type requires {{{}precision >=
scale{}}}, so the type inference throws an exception.
This issue is not exposed for typical {{Decimal128}} values like {{10.99}}
(where {{{}precision = 4{}}}, {{{}scale = 2{}}}) because the constraint
naturally holds. It only manifests when the fractional part contains more
digits than the significant digits — specifically when leading zeros push the
{{scale}} beyond the {{{}precision{}}}.
Summary: Fix StarRocks CHAR / VARCHAR mapping rules to contain utf8mb4
characters (was: Fix MongoDB YAML connector fails to infer type from some
BsonDecimal128 data)
> Fix StarRocks CHAR / VARCHAR mapping rules to contain utf8mb4 characters
> ------------------------------------------------------------------------
>
> Key: FLINK-39759
> URL: https://issues.apache.org/jira/browse/FLINK-39759
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Reporter: Mao Jiayi
> Priority: Major
> Labels: pull-request-available
>
> When using the StarRocks YAML connector to sync tables whose upstream
> {{CHAR}} or {{VARCHAR}} columns may contain utf8mb4 characters, the job may
> create undersized StarRocks columns and fail when writing data.
> The issue happens in {*}{{StarRocksUtils.toStarRocksDataType()}}{*}. The
> current mapping logic multiplies the upstream character length by {{3}} when
> converting {{CHAR}} and {{VARCHAR}} types to StarRocks column definitions.
> This assumes that each character takes at most 3 bytes in UTF-8. However,
> utf8mb4 characters can require up to 4 bytes, so the inferred StarRocks
> column length may be smaller than required.
> This issue is not exposed for sources whose character set only uses up to 3
> bytes per character, because the current mapping remains sufficient in those
> cases. It only manifests when the upstream source uses utf8mb4-like encodings
> and the data contains 4-byte Unicode characters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)