Yohahaha opened a new pull request, #3042: URL: https://github.com/apache/fluss/pull/3042
<!-- Generated-by: Qwen Code following the guidelines(https://github.com/apache/fluss/blob/main/AGENTS.md) --> ### Purpose Linked issue: close #2984 This PR adds support for reading lake-enabled primary key tables in the Spark connector. Previously, Spark could only read lake-enabled append-only tables. With this change, Spark can now read primary key tables that have data lake tiering enabled, by merging lake snapshot data with Fluss kv tail. ### Brief change log - Move `LakeSnapshotAndLogSplitScanner` from `fluss-flink` to `fluss-client` module for reuse across connectors - Refactor `LakeSnapshotAndLogSplitScanner` to be more generic (decouple from Flink-specific `LakeSnapshotAndFlussLogSplit` class) - Add Spark lake upsert read support: - `FlussLakeUpsertScanBuilder`: Scan builder for lake-enabled pk tables - `FlussLakeUpsertBatch`: Batch implementation that plans partitions combining lake splits with log offsets - `FlussLakeUpsertPartitionReader`: Reader that merges lake snapshot with Fluss log using sort-merge algorithm - `FlussLakeUpsertInputPartition`: Input partition containing lake splits and log offsets - Reorganize Spark lake read classes into `lake` subpackage for better code organization - Add fallback mechanism when no lake snapshot exists (falls back to pure Fluss kv reading) ### Tests - Added `SparkLakePrimaryKeyTableReadTestBase` with comprehensive tests: - Test fallback when no lake snapshot exists (both partitioned and non-partitioned tables) - Test lake snapshot + log merge reading for primary key tables - Tests verified with Paimon lake format ### API and Format No API or format changes. ### Documentation No new feature documentation required (extends existing lake reading capability). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
