Yohahaha opened a new pull request, #3042:
URL: https://github.com/apache/fluss/pull/3042

   <!--
   Generated-by: Qwen Code following the 
guidelines(https://github.com/apache/fluss/blob/main/AGENTS.md)
   -->
   
   ### Purpose
   
   Linked issue: close #2984
   
   This PR adds support for reading lake-enabled primary key tables in the 
Spark connector. Previously, Spark could only read lake-enabled append-only 
tables. With this change, Spark can now read primary key tables that have data 
lake tiering enabled, by merging lake snapshot data with Fluss kv tail.
   
   ### Brief change log
   
   - Move `LakeSnapshotAndLogSplitScanner` from `fluss-flink` to `fluss-client` 
module for reuse across connectors
   - Refactor `LakeSnapshotAndLogSplitScanner` to be more generic (decouple 
from Flink-specific `LakeSnapshotAndFlussLogSplit` class)
   - Add Spark lake upsert read support:
     - `FlussLakeUpsertScanBuilder`: Scan builder for lake-enabled pk tables
     - `FlussLakeUpsertBatch`: Batch implementation that plans partitions 
combining lake splits with log offsets
     - `FlussLakeUpsertPartitionReader`: Reader that merges lake snapshot with 
Fluss log using sort-merge algorithm
     - `FlussLakeUpsertInputPartition`: Input partition containing lake splits 
and log offsets
   - Reorganize Spark lake read classes into `lake` subpackage for better code 
organization
   - Add fallback mechanism when no lake snapshot exists (falls back to pure 
Fluss kv reading)
   
   ### Tests
   
   - Added `SparkLakePrimaryKeyTableReadTestBase` with comprehensive tests:
     - Test fallback when no lake snapshot exists (both partitioned and 
non-partitioned tables)
     - Test lake snapshot + log merge reading for primary key tables
     - Tests verified with Paimon lake format
   
   ### API and Format
   
   No API or format changes.
   
   ### Documentation
   
   No new feature documentation required (extends existing lake reading 
capability).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to