huan233usc opened a new pull request, #2656:
URL: https://github.com/apache/iceberg-rust/pull/2656
## Which issue does this PR close?
Part of #1690 — client support for REST server-side scan planning.
## What changes are included in this PR?
Adds a client implementation of the REST scan-planning protocol
(`planTableScan` / `fetchPlanningResult` / `fetchScanTasks`). When a catalog
advertises the planning endpoints, a table scan delegates planning to the
server and consumes the returned `FileScanTask`s instead of reading manifests
locally; otherwise it transparently falls back to native client-side
planning.
## Design
The whole feature hangs off a single seam — `TableScan::plan_files()` — so
execution (`to_arrow`, the Arrow reader) is untouched and DataFusion needs no
changes.
```
Table::scan() ─► TableScan::plan_files()
│ injected ScanPlanner present?
├─ no ─► native manifest planning (unchanged)
└─ yes ─► ScanPlanner::plan_table_scan(ScanPlanRequest)
│
┌───────┴─ endpoint negotiation: gate on the endpoints advertised
│ by GET /v1/config (else FeatureUnsupported → native)
▼
POST .../plan ──► COMPLETED ─────────────────────────────┐
└─► SUBMITTED ─► poll GET .../plan/{id} │
(exp backoff 1s→60s) ──────┤ COMPLETED
plan-tasks? ─► POST .../tasks (tokens may recurse) ────────┤
▼
convert wire content-files ─► FileScanTask (public builders)
+ build a plan-scoped FileIO from vended `storage-credentials`
▼
ServerScanPlan { tasks, file_io } ─► to_arrow() reads tasks through
file_io
(on Drop before completion: best-effort DELETE .../plan/{id})
```
### Components / flow
1. **Injection seam** — a narrow `ScanPlanner` capability trait
(`crates/iceberg/src/scan/planner.rs`). `Table`/`TableScanBuilder` carry
an
optional `Arc<dyn ScanPlanner>`; `plan_files()` delegates to it and falls
back to native planning on `ErrorKind::FeatureUnsupported`. The core
`Catalog` trait is untouched.
2. **Endpoint negotiation** — `CatalogConfig` now parses the `endpoints`
field
of `GET /v1/config`; the scan-plan calls are gated by an
`Endpoint::check`.
3. **Wire DTOs** — request/response types for plan / fetch-planning-result /
fetch-scan-tasks, plus a lean content-file shape (only the fields a
`FileScanTask` needs).
4. **State machine** — submit → poll-with-backoff → fan-out `fetchScanTasks`
(plan-task tokens may produce more tasks), with a best-effort
`DELETE .../plan/{id}` if the scan is dropped mid-flight.
5. **Conversion** — wire content-files → `FileScanTask` via the public
builders (no `DataFile` internals); the scan's own bound filter is used as
the per-task row predicate, and pushed down as Iceberg expression JSON
when
losslessly encodable.
6. **Credential vending** — `ScanPlanner::plan_table_scan` returns
`ServerScanPlan { tasks, file_io }`; the planner builds a plan-scoped
`FileIO` from the `storage-credentials` the server returns, and `to_arrow`
reads data files through it.
### Alternative injection design
The same feature with the planning capability placed on the core `Catalog`
trait (`Catalog::plan_table_scan`, `Table` holding `Arc<dyn Catalog>`)
instead
of a narrow `ScanPlanner` trait is in
https://github.com/huan233usc/iceberg-rust/pull/2 for comparison. This PR
uses
the narrow-trait design because it keeps the central `Catalog` trait minimal
and avoids giving every `Table` a back-reference to the full catalog.
## Are these changes tested?
Yes — unit tests for the wire DTOs, endpoint codec, and expression-JSON
serialization, conversion tests, and end-to-end `mockito` tests covering the
completed-inline, submitted-then-polled, and recursive plan-task fan-out
paths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]