JNSimba commented on PR #3767: URL: https://github.com/apache/flink-cdc/pull/3767#issuecomment-2795676279
> @JNSimba In the backfill task, there may be one problem for projection. See `RecordUtils.upsertBinlog` in `SnapshotSplitReader#pollSplitRecords`. > > For tables with PK, we use all PK columns to deduplicate the records. For tables with no PK, we use all columns to deduplicate the records. > > So the limit for the projection is as follows. For tables with PK, we need to keep all PK columns. For tables with no PK, we need to keep all columns. @ruanhang1993 Thanks for your suggestions. But I modified a version, but found some new problems: Like this case: **MySqlTimezoneITCase.testMySqlServerInBerlin** The id is the primary key, the inconsistency between **the actual projected fields (date_c, time_c, datetime3_c, datetime6_c, timestamp_c , id)** and the fields to be projected in the **select (date_c, time_c, datetime3_c, datetime6_c, timestamp_c )** may cause some problems. In non-incremental snapshot, **RowDataSerializer.copy** will force the judgment whether the type matches the data, so an error will be reported https://github.com/apache/flink/blob/release-1.20.1/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/typeutils/RowDataSerializer.java#L122-L125 ```java // from.getArity()=6 and types.length =5 if (from.getArity() != types.length) { throw new IllegalArgumentException( "Row arity: " + from.getArity() + ", but serializer arity: " + types.length); } ``` For this case , it is normal when incrementalSnapshot=true, but it is wrong when false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org