[PR] [spark] Fix duplicate column error when merging on _ROW_ID [paimon]

via GitHub Tue, 02 Dec 2025 03:03:33 -0800


wayneli-vt opened a new pull request, #6727:
URL: https://github.com/apache/paimon/pull/6727


   <!-- Please specify the module before the PR name: [core] ... or [flink] ... 
-->
   
   ### Purpose
   
   <!-- What is the purpose of the change -->
   When joining on `_ROWID` in a `MERGE INTO` operation (e.g., `ON 
target._ROW_ID = source._ROW_ID`), a duplicate column error would occur with 
the following exception: 
   ```scala
   Field names must be unique. Found duplicates: [_ROW_ID]
   java.lang.IllegalArgumentException: Field names must be unique. Found 
duplicates: [_ROW_ID]
        at org.apache.paimon.types.RowType.validateFields(RowType.java:268)
        at org.apache.paimon.types.RowType.<init>(RowType.java:79)
        at org.apache.paimon.types.RowType.copy(RowType.java:88)
        at 
org.apache.paimon.spark.SparkTypeUtils.prunePaimonType(SparkTypeUtils.java:119)
        at 
org.apache.paimon.spark.SparkTypeUtils.prunePaimonRowType(SparkTypeUtils.java:106)
        at 
org.apache.paimon.spark.ColumnPruningAndPushDown.$init$(ColumnPruningAndPushDown.scala:64)
        at 
org.apache.paimon.spark.PaimonBaseScan.<init>(PaimonBaseScan.scala:48)
        at org.apache.paimon.spark.PaimonScan.<init>(PaimonScan.scala:41)
        at 
org.apache.paimon.spark.PaimonScanBuilder.build(PaimonScanBuilder.scala:37)
       ...
   ```
   This PR fixes the issue by using the table's metadata columns to avoid the 
duplication.
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   A new test case has been added to `RowTrackingTestBase` to specifically 
cover and verify this fix:
   
   * `org.apache.paimon.spark.sql.RowTrackingTestBase# Data Evolution: merge 
into table with data-evolution on _ROW_ID`
   
   ### API and Format
   
   No.
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Fix duplicate column error when merging on _ROW_ID [paimon]

Reply via email to