tianfy created FLINK-39834:
------------------------------
Summary: Oracle CDC may duplicate UNISTR expressions when UNISTR
value contains ||
Key: FLINK-39834
URL: https://issues.apache.org/jira/browse/FLINK-39834
Project: Flink
Issue Type: Bug
Components: Flink CDC
Affects Versions: cdc-3.6.0
Environment: Flink CDC master / cdc-3.6.0
Oracle CDC connector
Debezium version: 1.9.8.Final
Reporter: tianfy
h3. Problem
Oracle LogMiner can emit NVARCHAR2 values as {{UNISTR(...)}} expressions. When
the quoted UNISTR payload itself contains the character sequence {{||}}, Flink
CDC's Oracle connector may treat it as a SQL concatenation operator.
For example:
{code}
UNISTR('\\592A\\5E73\\6D0B\\53CC\\514D4000||\\518D\\5236\\9020\\5DF2\\5F55\\5355||C440100VEH26071668')
{code}
This value should be decoded as one string while preserving the embedded
{{||}}. However, the current Debezium 1.9.8.Final {{UnistrHelper}} splits
directly on {{||}}. Each split part is no longer a complete {{UNISTR(...)}}
expression, so the fallback path appends the original expression repeatedly.
The downstream value can therefore be inflated into repeated {{UNISTR(...)}}
text and exceed the sink column length.
h3. Expected behavior
Only {{||}} outside a {{UNISTR('...')}} quoted payload should be treated as SQL
string concatenation. Embedded {{||}} inside the UNISTR payload should be kept
as normal data.
h3. Actual behavior
A single UNISTR expression that contains embedded {{||}} may be split into
invalid fragments, causing the original expression to be appended multiple
times.
h3. Proposed fix
Backport the Debezium DBZ-9132 style tokenizer into Flink CDC's Oracle
connector without changing the Debezium dependency version. The tokenizer
splits only on SQL concatenation operators outside UNISTR quoted data. The
Oracle pipeline connector should also exclude Debezium's original
{{io/debezium/connector/oracle/logminer/UnistrHelper.class}} from the shaded
Debezium Oracle artifact so Flink CDC's patched helper is packaged.
A pull request will be provided.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)