yunfengzhou-hub opened a new pull request, #8262: URL: https://github.com/apache/paimon/pull/8262
### Purpose Chain Table (`chain-table.enabled=true`) separates data into a `snapshot` branch (batch-imported full partitions) and a `delta` branch (incremental updates). Prior to this change, streaming read was not supported because the standard `DataTableStreamScan` is unaware of the two-branch architecture. This PR introduces `ChainTableFileStoreTable` (a wrapper over `FallbackReadFileStoreTable`) and `ChainTableStreamScan` which implements a two-phase streaming scan: Phase 1 does a full load by reading delta data pinned to the current snapshot and merging snapshot files for overlapping partitions; Phase 2 incrementally monitors the delta branch only, returning `DataSplit(isStreaming=true)` for changelog passthrough. The snapshot-pinning strategy makes the Phase 1 / Phase 2 boundary deterministic — no overlap or data loss regardless of concurrent commits. ### Tests Added `FlinkChainTableITCase` with 16 tests (all passing, ~75s): - Full load with snapshot+delta overlap, empty delta, empty snapshot - Changelog passthrough (-U/+U) with `changelog-producer=input` - Snapshot OVERWRITE does not trigger streaming output - Stateless restart and stateful restart (MiniCluster checkpoint/restore) - `WHERE` predicate forwarding, `withShard` forwarding - `scan.mode=latest` bypass, `changelog-producer=none` rejection - `restore(id, scanAll=true)` and `restore(null, scanAll=true)` state reset - `chain-partition-keys` group partition streaming -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
