Xiqian YU created FLINK-35102:
---------------------------------
Summary: Incorret Type mapping for Flink CDC Doris connector
Key: FLINK-35102
URL: https://issues.apache.org/jira/browse/FLINK-35102
Project: Flink
Issue Type: Bug
Components: Flink CDC
Reporter: Xiqian YU
According to Flink CDC Doris connector docs, CHAR and VARCHAR are mapped to
3-bytes since Doris uses UTF-8 variable-length encoding internally.
|CHAR(n)|CHAR(n*3)|In Doris, strings are stored in UTF-8 encoding, so English
characters occupy 1 byte and Chinese characters occupy 3 bytes. The length here
is multiplied by 3. The maximum length of CHAR is 255. Once exceeded, it will
automatically be converted to VARCHAR type.|
|VARCHAR(n)|VARCHAR(n*3)|Same as above. The length here is multiplied by 3. The
maximum length of VARCHAR is 65533. Once exceeded, it will automatically be
converted to STRING type.|
However, currently Doris connector maps `CHAR(n)` to `CHAR(n)` and `VARCHAR(n)`
to `VARCHAR(n * 4)`, which is inconsistent with specification in docs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)