[
https://issues.apache.org/jira/browse/FLINK-39613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ran Tao updated FLINK-39613:
----------------------------
Summary: Flink cdc pipeline primary-keys / partition-keys cannot resolve
backtick-quoted identifiers that are used as projection aliases (was: Flink
pipeline cdc primary-keys / partition-keys cannot resolve backtick-quoted
identifiers that are used as projection aliases)
> Flink cdc pipeline primary-keys / partition-keys cannot resolve
> backtick-quoted identifiers that are used as projection aliases
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39613
> URL: https://issues.apache.org/jira/browse/FLINK-39613
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: cdc-3.4.0, cdc-3.5.0
> Reporter: Ran Tao
> Priority: Major
>
> In a Flink CDC YAML pipeline, the transform rule accepts three closely
> related fields:
> - projection — parsed by Calcite with the MySQL dialect ( Lex.JAVA + MYSQL_5
> ), which strips the MySQL-style backticks around identifiers.
> - primary-keys / partition-keys — currently split by a plain
> String#split(",") + String#trim() in SchemaMetadataTransform , which keeps
> the backticks verbatim as part of the column name.
> Because the two sides use different parsing strategies, a backtick-quoted
> identifier used as a projection alias (for example a SQL reserved word such
> as `time` ) can no longer be referenced from primary-keys / partition-keys :
> the literal string `time` never matches the projection output column time ,
> and the pipeline fails at runtime with a column-not-found / schema-mismatch
> error.
> This is surprising to users because:
> 1. The YAML is syntactically valid and passes the YAML/Jackson layer.
> 2. The projection clause (on the same transform rule) accepts and even
> requires the backticks when the alias is a reserved word.
> 3. The error surfaces only at job startup/runtime, not at validation time,
> and the error message does not hint at the quoting mismatch.
> *How to reproduce*
> Minimal MySQL → any sink pipeline:
> {code:java}
> source:
> type: mysql
> # ...
> sink:
> type: paimon # or values/doris/starrocks, any sink works
> # ...
> transform:
> - source-table: mydb.web_order
> projection: \*, DATE_FORMAT(create_at, 'yyyyMMdd') AS `time`
> primary-keys: id, `time`
> partition-keys: `time` {code}
> Expected: the pipeline starts and routes records using time as part of the
> primary key / partition key.
> Actual: the job fails because `time` (with backticks) is not found in the
> projected schema, which only contains the unquoted column time .
> The same failure reproduces when the backtick-quoted name contains special
> characters (e.g. `order-id` , `with space` ) or is a SQL reserved word (
> `time` , `user` , `order` , …).
> *Root cause*
> - _org.apache.flink.cdc.runtime.operators.transform.SchemaMetadataTransform_
> parses primaryKeyString / partitionKeyString with:
>
> {code:java}
> Arrays.asList(primaryKeyString.split(",")).stream().map
> (String::trim)... {code}
> which preserves any surrounding backticks.
> - _org.apache.flink.cdc.runtime.parser.TransformParser_ parses projection via
> Calcite ( Lex.JAVA , MYSQL_5 ), which unquotes backtick identifiers.
> The downstream matching between the declared primary/partition keys and the
> projection output is string-based, so the asymmetry is fatal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)