tub opened a new pull request, #7418:
URL: https://github.com/apache/paimon/pull/7418

   ## Summary
   
   - Adds `SnapshotManager.get_snapshots_batch(snapshot_ids, max_workers=4)`: 
batch-checks existence of a list of snapshot files via 
`file_io.exists_batch()`, then fetches the existing ones in parallel using 
`ThreadPoolExecutor`, returning `{id: Snapshot|None}`
   - Adds `SnapshotManager.find_next_scannable(start_id, should_scan, 
lookahead_size=10, max_workers=4)`: looks ahead `lookahead_size` snapshot IDs, 
fetches them in one batch, and returns `(snapshot, next_id, skipped_count)` for 
the first snapshot passing `should_scan`
   
   These two methods are the performance foundation for 
`AsyncStreamingTableScan` (coming in `2c`):
   - `exists_batch` replaces N serial file existence checks with a single round 
trip — important on object stores where per-call latency is significant
   - Lookahead allows the scan loop to skip non-scannable commits (e.g. 
`COMPACT`, `OVERWRITE`) without a separate round trip per snapshot
   - `skipped_count` in the return value enables the prefetch path in the 
streaming scan to submit the next lookahead fetch to a background thread while 
the consumer processes the current plan
   
   ## Stack context
   
   This is part of a stack of PRs adding streaming read support to 
`paimon-python`. Each PR is independently reviewable with a narrow scope:
   
   | PR | Branch | Content |
   |:---|:-------|:--------|
   | #7417 | `python-streaming-2a-changelog-producer` | `ChangelogProducer` 
enum + config option |
   | **This PR** | `python-streaming-2b-snapshot-lookahead` | 
`SnapshotManager.get_snapshots_batch()` + `find_next_scannable()` |
   | Next | `python-streaming-2c-scan-and-builder` | `AsyncStreamingTableScan`, 
`StreamReadBuilder`, `Table.new_stream_read_builder()` |
   | Next | `python-streaming-2d-consumer` | Consumer ID integration into 
scan/builder |
   | Next | `python-streaming-2e-acceptance-docs` | `IncrementalDiffScanner` 
acceptance tests + streaming docs |
   
   Tracking issue: #7152
   
   ## Test plan
   
   - [ ] `cd paimon-python && python -m pytest 
pypaimon/tests/snapshot_manager_test.py -v`
   - [ ] `cd paimon-python && flake8 pypaimon/snapshot/snapshot_manager.py 
pypaimon/tests/snapshot_manager_test.py`
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to