Re: [PR] [spark] support merge-read between kv snapshot and log for primary-key table [fluss]

via GitHub Wed, 04 Feb 2026 08:30:50 -0800


wuchong commented on PR #2523:
URL: https://github.com/apache/fluss/pull/2523#issuecomment-3848464687


   
   > @YannByron I found a bug while testing reading PK tables, it fails when 
using the last column as the primary key, current cases in 
SparkPrimaryKeyTableReadTest all use first column and partition column as 
primary key.
   > 
   > ```scala
   > test("Spark Read: primary key table with last pk") {
   >     withTable("t") {
   >       sql("CREATE TABLE t (id int, name string, pk int, pk2 string) 
TBLPROPERTIES('primary.key'='pk,pk2')")
   >       checkAnswer(sql("SELECT * FROM t"), Nil)
   >       sql("INSERT INTO t VALUES (1, 'a', 10, 'x'), (2, 'b', 20, 'y')")
   >       checkAnswer(sql("SELECT * FROM t ORDER BY id"), Row(1, "a", 10, "x") 
:: Row(2, "b", 20, "y") :: Nil)
   >     }
   >   }
   > ```
   > 
   > above case will failed with
   > 
   > ```
   > Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 3.0 (TID 4) (192.168.0.116 executor 
driver): java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for 
length 2
   >    at org.apache.fluss.row.ProjectedRow.getInt(ProjectedRow.java:90)
   >    at 
org.apache.fluss.row.InternalRow.lambda$createFieldGetter$ff31e09f$6(InternalRow.java:198)
   >    at 
org.apache.fluss.row.encode.CompactedKeyEncoder.encodeKey(CompactedKeyEncoder.java:83)
   >    at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:113)
   >    at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:111)
   >    at 
org.apache.fluss.spark.utils.LogChangesIterator.hasSamePrimaryKey(LogChangesIterator.scala:117)
   >    at 
org.apache.fluss.spark.utils.LogChangesIterator.hasNext(LogChangesIterator.scala:85)
   >    at 
org.apache.fluss.client.table.scanner.SortMergeReader.readBatch(SortMergeReader.java:90)
   >    at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader.initialize(FlussUpsertPartitionReader.scala:217)
   >    at 
org.apache.fluss.spark.read.FlussUpsertPartitionReader.<init>(FlussUpsertPartitionReader.scala:86)
   >    at 
org.apache.fluss.spark.read.FlussUpsertPartitionReaderFactory.createReader(FlussPartitionReaderFactory.scala:61)
   > ```
   
   Thank you @Yohahaha , could you open a pull request to add and fix the test 
case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [spark] support merge-read between kv snapshot and log for primary-key table [fluss]

Reply via email to