[
https://issues.apache.org/jira/browse/FLINK-38110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ouyangwulin updated FLINK-38110:
--------------------------------
Attachment: image-2025-07-17-14-53-02-657.png
> PostgreSQL connector reads Chinese columns with garbled characters
> ------------------------------------------------------------------
>
> Key: FLINK-38110
> URL: https://issues.apache.org/jira/browse/FLINK-38110
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.4.0
> Reporter: haiqingchen
> Priority: Minor
> Attachments: image-2025-07-17-14-53-02-657.png
>
>
> When there's column name in Chinese in PG tables, Postgresql connector with
> pgoutput plugin will decode them as garbled characters, especially during
> incremental capure.
> The reason is when handling column names and table names,
> io.debezium.connector.postgresql.connection.pgoutput.PgOutputMessageDecoder
> doesn't convert the String to utf8 charset,
> {code:java}
> private static String readString(ByteBuffer buffer) {
> StringBuilder sb = new StringBuilder();
> boolean var2 = false;
> byte b;
> while((b = buffer.get()) != 0) {
> sb.append((char)b);
> }
> return sb.toString();
> } {code}
> while when it handle column value, it will convert the string into utf8
> charset.
> {code:java}
> private static String readColumnValueAsString(ByteBuffer buffer) {
> int length = buffer.getInt();
> byte[] value = new byte[length];
> buffer.get(value, 0, length);
> return new String(value, Charset.forName("UTF-8"));
> } {code}
> My solution is
> copy PgOutputMessageDecoder from debezium and fix the readString to reading
> utf8 string
--
This message was sent by Atlassian Jira
(v8.20.10#820010)