wayneli-vt opened a new pull request, #6727:
URL: https://github.com/apache/paimon/pull/6727
<!-- Please specify the module before the PR name: [core] ... or [flink] ...
-->
### Purpose
<!-- What is the purpose of the change -->
When joining on `_ROWID` in a `MERGE INTO` operation (e.g., `ON
target._ROW_ID = source._ROW_ID`), a duplicate column error would occur with
the following exception:
```scala
Field names must be unique. Found duplicates: [_ROW_ID]
java.lang.IllegalArgumentException: Field names must be unique. Found
duplicates: [_ROW_ID]
at org.apache.paimon.types.RowType.validateFields(RowType.java:268)
at org.apache.paimon.types.RowType.<init>(RowType.java:79)
at org.apache.paimon.types.RowType.copy(RowType.java:88)
at
org.apache.paimon.spark.SparkTypeUtils.prunePaimonType(SparkTypeUtils.java:119)
at
org.apache.paimon.spark.SparkTypeUtils.prunePaimonRowType(SparkTypeUtils.java:106)
at
org.apache.paimon.spark.ColumnPruningAndPushDown.$init$(ColumnPruningAndPushDown.scala:64)
at
org.apache.paimon.spark.PaimonBaseScan.<init>(PaimonBaseScan.scala:48)
at org.apache.paimon.spark.PaimonScan.<init>(PaimonScan.scala:41)
at
org.apache.paimon.spark.PaimonScanBuilder.build(PaimonScanBuilder.scala:37)
...
```
This PR fixes the issue by using the table's metadata columns to avoid the
duplication.
### Tests
<!-- List UT and IT cases to verify this change -->
A new test case has been added to `RowTrackingTestBase` to specifically
cover and verify this fix:
* `org.apache.paimon.spark.sql.RowTrackingTestBase# Data Evolution: merge
into table with data-evolution on _ROW_ID`
### API and Format
No.
### Documentation
<!-- Does this change introduce a new feature -->
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]