[I] [Bug]: got IndexOutOfBoundsException When Using Partial Table Fields as MERGE Source [amoro]

via GitHub Tue, 05 Aug 2025 19:33:49 -0700


Aireed opened a new issue, #3713:
URL: https://github.com/apache/amoro/issues/3713


   ### What happened?
   
   Table definition:
   ```
   
   CREATE TABLE merge_target (user_id INT,
   id BIGINT,
   salary DOUBLE,
   money FLOAT,
   ts TIMESTAMP,
   tails DECIMAL(30, 18),
   is_true BOOLEAN,
   username STRING,
   category STRING,
   PRIMARY KEY (id)）
   USING mixed_hive
   PARTITIONED BY (category)
   TBLPROPERTIES(
   "write.upsert.enabled" = "true",
   "self-optimizing.full.trigger.interval" = "180000");
   
   
   CREATE TABLE merge_source (user_id INT,
   id BIGINT,
   salary DOUBLE,
   money FLOAT,
   ts TIMESTAMP,
   tails DECIMAL(30, 18),
   is_true BOOLEAN,
   username STRING,
   category STRING,
   PRIMARY KEY (id)）
   USING mixed_hive
   PARTITIONED BY (category)
   TBLPROPERTIES(
   "write.upsert.enabled" = "true",
   "self-optimizing.full.trigger.interval" = "180000");
   ```
   
   
   **SPARK-SQL**
   ```
   MERGE INTO merge_target t
   USING (
   select 
   id,money,username 
   from merge_source) s
   ON t.id = s.id
   WHEN MATCHED AND t.id=111111 THEN DELETE
   when matched then update set t.salary=1000,t.money=s.money + 100
   WHEN NOT MATCHED THEN INSERT 
(t.id,t.user_id,t.salary,t.money,t.ts,t.tails,t.is_true,t.username,t.category) 
VALUES
   (s.id,9999,888.88,s.money,current_timestamp(),666.66,true,s.username,'cf01');
   ```
   
   **Exception**
   
   ```
   5/08/06 10:27:27 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) 
(amoro-dev3.gy.ntes executor 1): java.lang.IndexOutOfBoundsException: Index: 8, 
Size: 3
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at 
org.apache.amoro.spark.SparkInternalRowCastWrapper.isNullAt(SparkInternalRowCastWrapper.java:91)
        at 
org.apache.amoro.spark.SparkInternalRowWrapper.get(SparkInternalRowWrapper.java:55)
        at 
org.apache.amoro.shade.org.apache.iceberg.Accessors$PositionAccessor.get(Accessors.java:71)
        at 
org.apache.amoro.shade.org.apache.iceberg.Accessors$PositionAccessor.get(Accessors.java:58)
        at 
org.apache.amoro.shade.org.apache.iceberg.StructTransform.wrap(StructTransform.java:78)
        at 
org.apache.amoro.shade.org.apache.iceberg.PartitionKey.wrap(PartitionKey.java:30)
        at 
org.apache.amoro.shade.org.apache.iceberg.PartitionKey.partition(PartitionKey.java:64)
        at 
org.apache.amoro.io.writer.BaseTaskWriter.buildWriterKey(BaseTaskWriter.java:99)
        at 
org.apache.amoro.io.writer.ChangeTaskWriter.buildWriterKey(ChangeTaskWriter.java:68)
        at 
org.apache.amoro.io.writer.BaseTaskWriter.write(BaseTaskWriter.java:88)
        at 
org.apache.amoro.spark.writer.SimpleRowLevelDataWriter.update(SimpleRowLevelDataWriter.java:68)
        at 
org.apache.amoro.spark.writer.SimpleRowLevelDataWriter.update(SimpleRowLevelDataWriter.java:31)
        at 
org.apache.amoro.spark.sql.execution.DeltaWithMetadataWritingSparkTask.writeFunc(MixedFormatRowLevelWriteExec.scala:80)
        at 
org.apache.amoro.spark.sql.execution.DeltaWithMetadataWritingSparkTask.writeFunc(MixedFormatRowLevelWriteExec.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteDeltaExec.scala:165)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteDeltaExec.scala:203)
        at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteDeltaExec.scala:142)
        at 
org.apache.amoro.spark.sql.execution.DeltaWithMetadataWritingSparkTask.run(MixedFormatRowLevelWriteExec.scala:63)
        at 
org.apache.spark.sql.execution.datasources.v2.ExtendedV2ExistingTableWriteExec.$anonfun$writeWithV2$2(WriteDeltaExec.scala:101)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)```
   
   ### Affects Versions
   
   master
   
   ### What table formats are you seeing the problem on?
   
   Mixed-Iceberg
   
   ### What engines are you seeing the problem on?
   
   Spark
   
   ### How to reproduce
   
   _No response_
   
   ### Relevant log output
   
   ```shell
   
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's Code of Conduct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug]: got IndexOutOfBoundsException When Using Partial Table Fields as MERGE Source [amoro]

Reply via email to