[PR] [spark] Use Fluss kv snapshot in lake-batch fallback [fluss]

via GitHub Thu, 14 May 2026 09:00:33 -0700


fresh-borzoni opened a new pull request, #3317:
URL: https://github.com/apache/fluss/pull/3317


   ## Summary
   
   When a lake-enabled primary-key table has no lake snapshot yet (e.g. not yet 
tiered), batch reads fell back to scanning the entire log from EARLIEST for 
every bucket, ignoring any Fluss kv snapshots that already existed. For tables 
with kv snapshots taken but no lake snapshot, this re-read all historical 
changes.
   
   Plumb the same per-bucket dispatch that FlussUpsertBatch already uses into 
FlussLakeUpsertBatch.planFallbackPartitions: hybrid (kv snapshot + log tail 
bounded by stoppingOffset) where a snapshot exists, log-only from EARLIEST 
bounded by stoppingOffset otherwise. The reader side 
(FlussUpsertPartitionReader) already supports both shapes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Use Fluss kv snapshot in lake-batch fallback [fluss]

Reply via email to