[PR] [flink][spark] fix incorrect first_row_id range in DataEvolution MergeInto [paimon]

via GitHub Fri, 08 May 2026 21:14:58 -0700


steFaiz opened a new pull request, #7790:
URL: https://github.com/apache/paimon/pull/7790


   ### Purpose
   Current first row id check in DataEvolutionPartialWriter maybe incorrect 
because of special files i.e. Blob Files and Vector FIles, which may cause:
   ```text
   java.lang.AssertionError: assertion failed: Number of written records 2419 
does not match expected number 244 for first row ID 19352.
   ```
   
   This is because the blob file's record count override the normal file's 
record count:
   <img width="2244" height="280" alt="image" 
src="https://github.com/user-attachments/assets/1338caea-46ee-4310-96e5-57c8336dc6c6";
 />
   
   We should filter out special files when calculating first_row_id to 
record_count mapping
   ### Tests
   See :
   `org.apache.paimon.flink.action.DataEvolutionMergeIntoActionITCase` for 
flink test
   `org.apache.paimon.spark.sql.BlobTestBase` for spark test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [flink][spark] fix incorrect first_row_id range in DataEvolution MergeInto [paimon]

Reply via email to