tianfy created FLINK-39834:
------------------------------

             Summary: Oracle CDC may duplicate UNISTR expressions when UNISTR 
value contains ||
                 Key: FLINK-39834
                 URL: https://issues.apache.org/jira/browse/FLINK-39834
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
    Affects Versions: cdc-3.6.0
         Environment: Flink CDC master / cdc-3.6.0

Oracle CDC connector

Debezium version: 1.9.8.Final
            Reporter: tianfy


h3. Problem

Oracle LogMiner can emit NVARCHAR2 values as {{UNISTR(...)}} expressions. When 
the quoted UNISTR payload itself contains the character sequence {{||}}, Flink 
CDC's Oracle connector may treat it as a SQL concatenation operator.

For example:
{code}
UNISTR('\\592A\\5E73\\6D0B\\53CC\\514D4000||\\518D\\5236\\9020\\5DF2\\5F55\\5355||C440100VEH26071668')
{code}

This value should be decoded as one string while preserving the embedded 
{{||}}. However, the current Debezium 1.9.8.Final {{UnistrHelper}} splits 
directly on {{||}}. Each split part is no longer a complete {{UNISTR(...)}} 
expression, so the fallback path appends the original expression repeatedly. 
The downstream value can therefore be inflated into repeated {{UNISTR(...)}} 
text and exceed the sink column length.

h3. Expected behavior

Only {{||}} outside a {{UNISTR('...')}} quoted payload should be treated as SQL 
string concatenation. Embedded {{||}} inside the UNISTR payload should be kept 
as normal data.

h3. Actual behavior

A single UNISTR expression that contains embedded {{||}} may be split into 
invalid fragments, causing the original expression to be appended multiple 
times.

h3. Proposed fix

Backport the Debezium DBZ-9132 style tokenizer into Flink CDC's Oracle 
connector without changing the Debezium dependency version. The tokenizer 
splits only on SQL concatenation operators outside UNISTR quoted data. The 
Oracle pipeline connector should also exclude Debezium's original 
{{io/debezium/connector/oracle/logminer/UnistrHelper.class}} from the shaded 
Debezium Oracle artifact so Flink CDC's patched helper is packaged.

A pull request will be provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to