Revanth14 opened a new pull request, #1213:
URL: https://github.com/apache/iceberg-go/pull/1213

   # REST scan planning: proposed public API surface (#1178)
   
   > This design is under active development and will keep changing as it's 
discussed. Feedback on the open decisions especially welcome and appreciated. 
Thank you.
   
   This is a design proposal, not an implementation. Every body is 
`panic("unimplemented")`, and there is no behavior change.
   
   Opening this as a "PR with interfaces" so the seam can be reviewed in Go 
before implementation. Part of #1178.
   
   ## Why
   
   Planning large-table scans entirely in a short-lived Go client (Lambda, 
Cloud Run, small k8s jobs) can be slow or infeasible. The REST spec's 
scan-planning endpoints let the catalog plan near its metadata and return only 
the tasks the client needs, with optional plan-scoped credentials and 
catalog-side governance. **Local planning stays the default; remote is opt-in.**
   
   ## Public API surface
   
   **`table` (the seam)**
   - `ScanPlanningMode` (`local` / `remote` / `auto`) + 
`WithScanPlanningMode(...) ScanOption`
   - `ScanPlanner`: `SupportsRemoteScanPlanning() bool`, `PlanFiles(ctx, 
ScanPlanningRequest) (ScanPlanningResult, error)`
   - `ScanPlanningRequest` / `ScanPlanningResult`
   - Proposed `FileScanTask.Residual iceberg.BooleanExpression`: not added here 
(would trip the `codec/file_scan_task.go` drift guard), lands with the 
scan-task decoder PR.
   
   **root `iceberg` (expression codec)**
   - `MarshalExpressionJSON` / `UnmarshalExpressionJSON`: Java 
`ExpressionParser`-compatible. Correctness-critical; every encoding is pinned 
to checked-in Java golden fixtures (e.g. `AlwaysTrue`/`False` serialize as the 
bare booleans `true`/`false`).
   
   **`catalog/rest` (client)**
   - Capability discovery `Endpoint` + `SupportsEndpoint`: **Phase 0, already 
implemented in a separate PR** (held pending).
   - Split capability: `SupportsPlanTableScan` / 
`SupportsFullRemoteScanPlanning`
   - `PlanTableScan` / `FetchPlanningResult` / `CancelPlanning` / 
`FetchScanTasks` / `WaitForPlan`
   - `ErrPlanExpired` and wire-type envelopes
   - `var _ table.ScanPlanner = (*Catalog)(nil)`: compile-time proof the seam 
fits.
   
   `table` does not import `catalog/rest`; `StorageCredential` stays out of 
`table`.
   
   ## Decisions pending (before this lands)
   
   1. **Plan-scoped FileIO carrier (OQ1).** How does plan-scoped IO reach 
`ReadTasks` across the `PlanFiles` -> `ReadTasks` boundary? Options: a richer 
planned-result object, an internal plan context on `Scan`, or a serializable 
credential handle on `FileScanTask`. A *live* FileIO should not sit on 
`FileScanTask` because it has a transport codec. No firm recommendation yet.
   2. **Capability gating (OQ2).** A single gate is too coarse: requiring all 
four endpoints falls back to local on sync-only servers; requiring only the 
plan endpoint can false-positive on `submitted` / `plan-tasks` flows. Proposed 
split: `SupportsPlanTableScan` vs `SupportsFullRemoteScanPlanning`. I'll 
confirm what the Java `iceberg-rest-fixture` advertises.
   3. **Schema binding across the boundary (OQ4).** A returned residual must 
bind to the snapshot schema (via schema-id); the ordered-partition decode uses 
each file's partition spec-id, kept separate. `UseSnapshotSchema` needs a 
concrete contract.
   4. **`scan-planning-mode=server`.** Proposed fail-fast contract: config 
`server` requires remote planning; an explicit `ScanPlanningLocal` against a 
`server` table is an error, not a silent local plan.
   
   ## Phasing
   
   0. Capability discovery (`endpoints` decode + `SupportsEndpoint`) - separate 
PR, no behavior change.
   1. Expression JSON codec - highest risk; Java golden tests.
   2. REST content-file + scan-task JSON; add `FileScanTask.Residual`.
   3. Wire types + validation.
   4. REST client methods + poller/backoff + cancel-on-context.
   5. Table planner seam.
   6. Scanner delegation (`WithScanPlanningMode`, local/remote/auto).
   7. Fake server + parity tests; gated Java-fixture integration.
   8. Docs + hardening.
   
   ## Correctness anchor
   
   The acceptance bar is a local-vs-remote parity test. A fake server that 
plans locally internally proves *plumbing only*; the real oracle is golden 
request/response fixtures captured from the Java `iceberg-rest-fixture`.
   Expression-JSON mismatch and incorrect server residuals are the 
silent-corruption risks this design guards against.
   
   ## Not in this PR
   
   Implementation (bodies panic), a REST server, a distributed worker runtime, 
incremental/CDC scans (deferred; point-in-time first), and any change to the 
local planning path.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to