[I] [spark][bug] Batch read PK table failed when use random column as primary key [fluss]

via GitHub Thu, 02 Apr 2026 00:40:11 -0700


Yohahaha opened a new issue, #2986:
URL: https://github.com/apache/fluss/issues/2986


   I found a bug while testing reading PK tables, it fails when using the last 
column as the primary key, current cases in SparkPrimaryKeyTableReadTest all 
use first column and partition column as primary key.
   
   ```scala
   test("Spark Read: primary key table with last pk") {
       withTable("t") {
         sql("CREATE TABLE t (id int, name string, pk int, pk2 string) 
TBLPROPERTIES('primary.key'='pk,pk2')")
         checkAnswer(sql("SELECT * FROM t"), Nil)
         sql("INSERT INTO t VALUES (1, 'a', 10, 'x'), (2, 'b', 20, 'y')")
         checkAnswer(sql("SELECT * FROM t ORDER BY id"), Row(1, "a", 10, "x") 
:: Row(2, "b", 20, "y") :: Nil)
       }
     }
   ```
   
   above case will failed with 
   ```
   Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 3.0 (TID 4) (192.168.0.116 executor 
driver): java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for 
length 2
        at org.apache.fluss.row.ProjectedRow.getInt(ProjectedRow.java:90)
        at 
org.apache.fluss.row.InternalRow.lambda$createFieldGetter$ff31e09f$6(InternalRow.java:198)
        at 
org.apache.fluss.row.encode.CompactedKeyEncoder.encodeKey(CompactedKeyEncoder.java:83)
        at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:113)
        at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:111)
        at 
org.apache.fluss.spark.utils.LogChangesIterator.hasSamePrimaryKey(LogChangesIterator.scala:117)
        at 
org.apache.fluss.spark.utils.LogChangesIterator.hasNext(LogChangesIterator.scala:85)
        at 
org.apache.fluss.client.table.scanner.SortMergeReader.readBatch(SortMergeReader.java:90)
        at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader.initialize(FlussUpsertPartitionReader.scala:217)
        at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader.<init>(FlussUpsertPartitionReader.scala:86)
        at 
org.apache.fluss.spark.read.FlussUpsertPartitionReaderFactory.createReader(FlussPartitionReaderFactory.scala:61)
   ```
   
   _Originally posted by @Yohahaha in 
https://github.com/apache/fluss/issues/2523#issuecomment-3835480927_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [spark][bug] Batch read PK table failed when use random column as primary key [fluss]

Reply via email to