yunfengzhou-hub opened a new pull request, #8262:
URL: https://github.com/apache/paimon/pull/8262

   
   ### Purpose
   
   Chain Table (`chain-table.enabled=true`) separates data into a `snapshot` 
branch (batch-imported full partitions) and a `delta` branch (incremental 
updates). Prior to this change, streaming read was not supported because the 
standard `DataTableStreamScan` is unaware of the two-branch architecture.
   
   This PR introduces `ChainTableFileStoreTable` (a wrapper over 
`FallbackReadFileStoreTable`) and `ChainTableStreamScan` which implements a 
two-phase streaming scan: Phase 1 does a full load by reading delta data pinned 
to the current snapshot and merging snapshot files for overlapping partitions; 
Phase 2 incrementally monitors the delta branch only, returning 
`DataSplit(isStreaming=true)` for changelog passthrough. The snapshot-pinning 
strategy makes the Phase 1 / Phase 2 boundary deterministic — no overlap or 
data loss regardless of concurrent commits.
   
   ### Tests
   
   Added `FlinkChainTableITCase` with 16 tests (all passing, ~75s):
   
   - Full load with snapshot+delta overlap, empty delta, empty snapshot
   - Changelog passthrough (-U/+U) with `changelog-producer=input`
   - Snapshot OVERWRITE does not trigger streaming output
   - Stateless restart and stateful restart (MiniCluster checkpoint/restore)
   - `WHERE` predicate forwarding, `withShard` forwarding
   - `scan.mode=latest` bypass, `changelog-producer=none` rejection
   - `restore(id, scanAll=true)` and `restore(null, scanAll=true)` state reset
   - `chain-partition-keys` group partition streaming
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to