paulo-t opened a new pull request, #4426:
URL: https://github.com/apache/flink-cdc/pull/4426

   ## What is the purpose of the change
   
   This change fixes Oracle CDC UNISTR decoding when the UNISTR quoted payload 
contains the character sequence `||`.
   
   Oracle LogMiner can emit NVARCHAR2 values as `UNISTR(...)`. The current 
Debezium 1.9.8.Final `UnistrHelper` splits directly on `||`, so a value like:
   
   ```text
   UNISTR('\\592A...4000||\\518D...||C440100VEH26071668')
   ```
   
   can be split into invalid fragments. The fallback path then appends the 
original expression repeatedly, causing downstream values to contain duplicated 
`UNISTR(...)` text and potentially exceed sink column length limits.
   
   ## Brief change log
   
   - Add a patched `io.debezium.connector.oracle.logminer.UnistrHelper` in the 
Oracle CDC connector to tokenize UNISTR expressions and split only on SQL 
concatenation operators outside quoted UNISTR data.
   - Add regression tests for normal UNISTR decoding, external UNISTR 
concatenation, and embedded `||` inside a UNISTR payload.
   - Exclude Debezium's original `UnistrHelper.class` from the shaded Oracle 
pipeline connector so the patched helper is packaged.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   ```bash
   mvn -pl 
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-oracle-cdc \
     -DskipITs -DskipE2eTests -Dcheckstyle.skip -Dspotless.check.skip=true \
     -Dtest=io.debezium.connector.oracle.logminer.UnistrHelperTest test
   ```
   
   Tests run: 4, Failures: 0, Errors: 0, Skipped: 0.
   
   Apache Jira: https://issues.apache.org/jira/browse/FLINK-39834


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to