MOBIN created FLINK-36580:
-----------------------------
Summary: Using debezium.column.exclude.list causes job failure
Key: FLINK-36580
URL: https://issues.apache.org/jira/browse/FLINK-36580
Project: Flink
Issue Type: Bug
Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: MOBIN
When the user uses \* (too many columns) when synchronizing data and wants to
ignore some privacy columns, the pipeline yml is as follows:
{code:java}
source:
type: mysql
....
debezium.column.exclude.list: test_db.test_table.name
transform:
- source-table: test_db.test_table
projection: \*
description: project fields and filter {code}
an error will be reported:
{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 33370909
at
org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.getLongMultiSegments(BinarySegmentUtils.java:731)
at
org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.getLong(BinarySegmentUtils.java:721)
at
org.apache.flink.cdc.common.data.binary.BinarySegmentUtils.readTimestampData(BinarySegmentUtils.java:1017)
at org.apache.flink.cdc.common.
data.binary.BinaryRecordData.getTimestamp(BinaryRecordData.java:173)
at
org.apache.flink.cdc.common.data.RecordData.lambda$createFieldGetter$f6fd429a$1(RecordData.java:214)
at
org.apache.flink.cdc.common.data.RecordData.lambda$createFieldGetter$79d0aaaa$1(RecordData.java:245)
at
org.apache.flink.cdc.runtime.operators.transform.PreTransformProcessor.getValueFromBinaryRecordData(PreTransformProcessor.java:111)
at org.apache.flink.cdc.run
time.operators.transform.PreTransformProcessor.processFillDataField(PreTransformProcessor.java:90)
at
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processProjection(PreTransformOperator.java:427)
at
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processDataChangeEvent(PreTransformOperator.java:399)
at
org.apache.flink.cdc.runtime.operators.transform.PreTransformOperator.processElement(PreTransformOperator.java:251)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushTo
Operator(CopyingChainingOutput.java:75)
at
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:50)
at
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at
org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:310)
at
org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110)
{code}
Cause of error: debezium will ignore the debezium.column.exclude.list field
when collecting data, but cdc does not take this into account when obtaining
the table schema, resulting in the inconsistency between the number of columns
in tableChangeInfo.getSourceSchema() and the number of columns in
BinaryRecordData
Temporary solution: projection: \*, cast(null as STRING) as name
--
This message was sent by Atlassian Jira
(v8.20.10#820010)