laskoviymishka commented on code in PR #1213: URL: https://github.com/apache/iceberg-go/pull/1213#discussion_r3442535806
########## expression_json.go: ########## @@ -0,0 +1,48 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST scan-planning +// expression JSON (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the API shape can be reviewed. +// +// The codec lives in the root iceberg package because expression internals +// are defined here and are not exported. Compatibility with Java's +// ExpressionParser is correctness-critical and every encoding must be +// confirmed against checked-in Java golden fixtures. +// +// Design decision: MarshalExpressionJSON emits Java ExpressionParser wire +// format, including bare JSON booleans for AlwaysTrue/AlwaysFalse (`true` and +// `false`). That intentionally differs from the REST OpenAPI Expression schema, +// where true/false are represented as objects (`{"type":"true"}` and +// `{"type":"false"}`). UnmarshalExpressionJSON must accept both forms so +// clients can read Java-compatible responses and strictly spec-shaped payloads. + +package iceberg + +// MarshalExpressionJSON serializes a boolean expression to the JSON format +// produced by Java's ExpressionParser, for use as a REST scan-planning filter. +func MarshalExpressionJSON(expr BooleanExpression) ([]byte, error) { Review Comment: This is the one I'd most want to nail down before we commit the signature. The doc has `Marshal` emit Java ExpressionParser format — bare `true`/`false` — but the REST OpenAPI `Expression` schema models `TrueExpression`/`FalseExpression` as objects (`{"type":"true"}`), and a server that validates the `Expression` oneOf could reject a bare boolean in the request `filter`. The decode side is fine — accepting both forms is clearly right. My question is purely outbound: does the Java REST reference actually accept a bare boolean on the request `filter`, or only the object form? If you've confirmed against the Java fixture that bare booleans round-trip through `planTableScan`, a one-line note saying so settles it. If not, I'd lean toward `Marshal` emitting object form for the REST direction even though ExpressionParser (the metadata-file codec) produces bare booleans. wdyt? ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) Review Comment: This proves `Catalog` satisfies `ScanPlanner`, but not how a `Table` actually gets one — the `Catalog -> Table -> Scan -> planner` path is only sketched as prose at the bottom of `table/scan_planning.go`, and it's the one piece everything downstream needs. Could we settle one concrete wiring option in this PR, even just as a committed comment? The two I can see are a `planner` field on `Table` set in `table.New`, or extending the catalog/IO seam with a `ScanPlanner()` accessor. Either is fine, but since the delegation PR builds directly on whichever we pick, I'd rather choose now than discover the seam doesn't fit later. ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { Review Comment: The bodies panic, which is the premise of the PR — but these are exported symbols with no `//go:build` tag, no `internal/` placement, and no `_` prefix, so if `main` cuts a release before implementation lands, they ship as crash-on-call public API. I'd have the bodies return `fmt.Errorf("not yet implemented: %w", ErrNotImplemented)` instead of `panic`, which keeps the surface reviewable without the release hazard. Putting the files under `internal/` until impl lands or gating them behind a build constraint also works. Do you have a release-timing guarantee in mind, or should we pick one of these? Same applies to `MarshalExpressionJSON`/`UnmarshalExpressionJSON` and `WithScanPlanningMode`. ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// FetchPlanningResult polls a previously submitted plan. +func (r *Catalog) FetchPlanningResult(ctx context.Context, ident table.Identifier, planID string) (FetchPlanningResultResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// CancelPlanning cancels a server-side plan. Callers should cancel on context +// cancellation using a detached context with a short timeout. +func (r *Catalog) CancelPlanning(ctx context.Context, ident table.Identifier, planID string) error { + panic("unimplemented: proposed API for #1178") +} + +// FetchScanTasks fetches the scan tasks for a plan-task handle returned by a +// completed plan. +func (r *Catalog) FetchScanTasks(ctx context.Context, ident table.Identifier, req FetchScanTasksRequest) (FetchScanTasksResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// WaitForPlan polls a submitted plan to completion using jittered backoff, +// cancelling the server-side plan if the context is cancelled. It returns an +// error if the plan is still submitted after the wait, cancelled, failed, or +// expired. +func (r *Catalog) WaitForPlan(ctx context.Context, ident table.Identifier, planID string, opts WaitForPlanOptions) (CompletedPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Wire types (sketch) ---------------------------------------------------- +// +// Content-file, delete-file, and residual decoding lands with the scan-task +// decoder PR; these sketch the request/response envelopes so the client +// surface compiles and reads. + +// PlanStatus is the status of a server-side plan. +type PlanStatus string + +const ( + PlanStatusCompleted PlanStatus = "completed" + PlanStatusSubmitted PlanStatus = "submitted" + // PlanStatusCancelled is valid when polling a submitted plan, but invalid + // as a planTableScan response. PlanTableScan and WaitForPlan should treat a + // cancelled initial planning response as an error. + PlanStatusCancelled PlanStatus = "cancelled" + PlanStatusFailed PlanStatus = "failed" +) + +// PlanningError is the ErrorModel payload carried by a failed planning result. +type PlanningError struct { + Message string `json:"message"` + Type string `json:"type"` + Code int `json:"code"` + Stack []string `json:"stack,omitempty"` +} + +// ScanTasks carries the task payload shared by completed planning responses and +// fetchScanTasks responses. Task/delete payload decoding lands with the +// scan-task decoder PR. +type ScanTasks struct { + PlanTasks []string `json:"plan-tasks,omitempty"` + FileScanTasks json.RawMessage `json:"file-scan-tasks,omitempty"` + DeleteFiles json.RawMessage `json:"delete-files,omitempty"` +} + +// CompletedPlanningResult is the completed arm of the planning-result union. +// PlanID is required by the initial planTableScan response's +// CompletedPlanningWithIDResult arm and omitted by fetchPlanningResult. +type CompletedPlanningResult struct { + Status PlanStatus `json:"status"` + PlanID *string `json:"plan-id,omitempty"` + ScanTasks + StorageCredentials []StorageCredential `json:"storage-credentials,omitempty"` +} + +// PlanTableScanRequest is the POST .../plan request body. Filter is the +// ExpressionParser-format JSON produced by iceberg.MarshalExpressionJSON. +type PlanTableScanRequest struct { + SnapshotID *int64 `json:"snapshot-id,omitempty"` + StartSnapshotID *int64 `json:"start-snapshot-id,omitempty"` + EndSnapshotID *int64 `json:"end-snapshot-id,omitempty"` + Select []string `json:"select,omitempty"` + Filter json.RawMessage `json:"filter,omitempty"` + MinRowsRequested *int64 `json:"min-rows-requested,omitempty"` + CaseSensitive *bool `json:"case-sensitive,omitempty"` + UseSnapshotSchema *bool `json:"use-snapshot-schema,omitempty"` + StatsFields []string `json:"stats-fields,omitempty"` +} + +// PlanTableScanResponse is the POST .../plan response. The spec models this as +// a `status`-discriminated union; the flat struct carries every arm's fields +// with omitempty so none are discarded. Task/delete RawMessage payloads are +// decoded by the scan-task decoder PR. PlanID is required for submitted and +// completed responses from planTableScan; implementations must reject a missing +// PlanID for either status. A cancelled status is invalid here and must be +// treated as an error. +type PlanTableScanResponse struct { + Status PlanStatus `json:"status"` + PlanID *string `json:"plan-id,omitempty"` + Error *PlanningError `json:"error,omitempty"` + ScanTasks + StorageCredentials []StorageCredential `json:"storage-credentials,omitempty"` +} + +// FetchPlanningResultResponse is the GET .../plan/{plan-id} poll response. Same +// `status`-discriminated union (completed / submitted / cancelled / failed). +type FetchPlanningResultResponse struct { + Status PlanStatus `json:"status"` + Error *PlanningError `json:"error,omitempty"` + ScanTasks + StorageCredentials []StorageCredential `json:"storage-credentials,omitempty"` +} + +// FetchScanTasksRequest is the POST .../tasks request body. +type FetchScanTasksRequest struct { + PlanTask string `json:"plan-task"` +} + +// FetchScanTasksResponse is the POST .../tasks response. May itself return more +// plan-tasks for further fanout. Task/delete payloads decoded by the +// scan-task decoder PR. +type FetchScanTasksResponse struct { + ScanTasks +} + +// WaitForPlanOptions tunes the polling loop. Defaults should be conservative. +type WaitForPlanOptions struct { + MinDelay time.Duration + MaxDelay time.Duration + Timeout time.Duration Review Comment: `Timeout` here duplicates what the `ctx` already carries — `WaitForPlan` takes a `context.Context`, so the deadline is the idiomatic way to bound the wait, and a second `Timeout` callers set separately is a foot-gun on the zero value. I'd drop `Timeout` and keep just `MinDelay`/`MaxDelay` for the jitter, letting callers pass `context.WithTimeout`. wdyt? ########## table/scan_planning.go: ########## @@ -0,0 +1,133 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the seam can be reviewed as Go rather +// than prose. Nothing here changes existing behavior. + +package table + +import ( + "context" + + "github.com/apache/iceberg-go" + icebergio "github.com/apache/iceberg-go/io" +) + +// ScanPlanningMode is the user-facing scan option: three values +// (local/remote/auto) selecting how (*Scan).PlanFiles plans a scan. Local +// planning remains the default; remote is opt-in via WithScanPlanningMode. +// +// This is deliberately distinct from the REST table-config key +// `scan-planning-mode` (values `client`/`server`), which is a server directive +// resolved separately (OQ4): a `client` table forces local planning, a `server` +// table forces remote planning, and explicit conflicting scan options fail +// fast. There is intentionally no fourth `server` value here; the directive +// lives in the table config, not the user option. +type ScanPlanningMode string + +const ( + // ScanPlanningLocal always plans locally by reading manifests through the + // table's FileIO. This is the default and current behavior. + ScanPlanningLocal ScanPlanningMode = "local" + // ScanPlanningRemote requires a planner that advertises remote capability + // and fails loudly if remote planning is unavailable. + ScanPlanningRemote ScanPlanningMode = "remote" + // ScanPlanningAuto uses remote planning when available and allowed by the + // table config, otherwise falls back to local. + ScanPlanningAuto ScanPlanningMode = "auto" +) + +// WithScanPlanningMode sets the scan-planning mode for a scan. The default is +// ScanPlanningLocal unless the REST table config requires server planning. +func WithScanPlanningMode(mode ScanPlanningMode) ScanOption { + // The panic is deferred to option application, not construction: an + // unimplemented option must not blow up when an options slice is built, + // only if it is actually applied to a Scan. + return func(*Scan) { panic("unimplemented: proposed API for #1178") } +} + +// ScanPlanningRequest is the input a Scan hands to a ScanPlanner. It carries +// the resolved scan state a planner needs without depending on catalog/rest. +// +// Open question (epic OQ4): when the table has evolved, UseSnapshotSchema must +// pin which schema binds a returned residual and the partition decode: the +// snapshot's schema (via schema-id), kept separate from each file's partition +// spec-id. Incremental scans (start/end snapshot) are deferred to a later +// phase; point-in-time SnapshotID lands first. +type ScanPlanningRequest struct { + Identifier Identifier + // Metadata is the full table metadata. This likely over-specifies the + // contract: a planner needs only schema(s), partition specs, and snapshot + // resolution; narrowing to a smaller interface is an open refinement. + Metadata Metadata + MetadataLocation string + SnapshotID *int64 + SelectedFields []string + RowFilter iceberg.BooleanExpression + MinRowsRequested *int64 + StatsFields []string + // CaseSensitive must carry the Scan's value (which defaults to true), not + // Go's false zero value, or the wire request would flip the spec default. + CaseSensitive bool Review Comment: The comment names the exact risk, but `bool` doesn't structurally prevent it — a partially-initialized `ScanPlanningRequest` sends `case-sensitive: false` and silently runs a case-insensitive scan, which is the spec's non-default. The wire `PlanTableScanRequest.CaseSensitive` already uses `*bool` for this reason, so the two are inconsistent too. I'd either make this `*bool` (nil = use the scan default), or add a `NewScanPlanningRequest` constructor that seeds `CaseSensitive: true`. Same situation with `UseSnapshotSchema` if its default isn't false. ########## table/scan_planning.go: ########## @@ -0,0 +1,133 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the seam can be reviewed as Go rather +// than prose. Nothing here changes existing behavior. + +package table + +import ( + "context" + + "github.com/apache/iceberg-go" + icebergio "github.com/apache/iceberg-go/io" +) + +// ScanPlanningMode is the user-facing scan option: three values +// (local/remote/auto) selecting how (*Scan).PlanFiles plans a scan. Local +// planning remains the default; remote is opt-in via WithScanPlanningMode. +// +// This is deliberately distinct from the REST table-config key +// `scan-planning-mode` (values `client`/`server`), which is a server directive +// resolved separately (OQ4): a `client` table forces local planning, a `server` +// table forces remote planning, and explicit conflicting scan options fail +// fast. There is intentionally no fourth `server` value here; the directive +// lives in the table config, not the user option. +type ScanPlanningMode string + +const ( + // ScanPlanningLocal always plans locally by reading manifests through the + // table's FileIO. This is the default and current behavior. + ScanPlanningLocal ScanPlanningMode = "local" + // ScanPlanningRemote requires a planner that advertises remote capability + // and fails loudly if remote planning is unavailable. + ScanPlanningRemote ScanPlanningMode = "remote" + // ScanPlanningAuto uses remote planning when available and allowed by the + // table config, otherwise falls back to local. + ScanPlanningAuto ScanPlanningMode = "auto" +) + +// WithScanPlanningMode sets the scan-planning mode for a scan. The default is +// ScanPlanningLocal unless the REST table config requires server planning. +func WithScanPlanningMode(mode ScanPlanningMode) ScanOption { + // The panic is deferred to option application, not construction: an + // unimplemented option must not blow up when an options slice is built, + // only if it is actually applied to a Scan. + return func(*Scan) { panic("unimplemented: proposed API for #1178") } Review Comment: The "deferred panic" comment is a little misleading — `Table.Scan` applies options eagerly, so `Scan(WithScanPlanningMode(...))` panics on the first call, not just on some later apply. I'd either reword the comment to "panics when passed to `Table.Scan`", or have it return the existing no-op option (storing the mode) for the proposal phase so it's harmless until delegation lands. Minor, but the current wording suggests a safety that isn't there. ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// FetchPlanningResult polls a previously submitted plan. +func (r *Catalog) FetchPlanningResult(ctx context.Context, ident table.Identifier, planID string) (FetchPlanningResultResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// CancelPlanning cancels a server-side plan. Callers should cancel on context +// cancellation using a detached context with a short timeout. +func (r *Catalog) CancelPlanning(ctx context.Context, ident table.Identifier, planID string) error { + panic("unimplemented: proposed API for #1178") +} + +// FetchScanTasks fetches the scan tasks for a plan-task handle returned by a +// completed plan. +func (r *Catalog) FetchScanTasks(ctx context.Context, ident table.Identifier, req FetchScanTasksRequest) (FetchScanTasksResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// WaitForPlan polls a submitted plan to completion using jittered backoff, +// cancelling the server-side plan if the context is cancelled. It returns an +// error if the plan is still submitted after the wait, cancelled, failed, or +// expired. +func (r *Catalog) WaitForPlan(ctx context.Context, ident table.Identifier, planID string, opts WaitForPlanOptions) (CompletedPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Wire types (sketch) ---------------------------------------------------- +// +// Content-file, delete-file, and residual decoding lands with the scan-task +// decoder PR; these sketch the request/response envelopes so the client +// surface compiles and reads. + +// PlanStatus is the status of a server-side plan. +type PlanStatus string + +const ( + PlanStatusCompleted PlanStatus = "completed" + PlanStatusSubmitted PlanStatus = "submitted" + // PlanStatusCancelled is valid when polling a submitted plan, but invalid + // as a planTableScan response. PlanTableScan and WaitForPlan should treat a + // cancelled initial planning response as an error. + PlanStatusCancelled PlanStatus = "cancelled" + PlanStatusFailed PlanStatus = "failed" +) + +// PlanningError is the ErrorModel payload carried by a failed planning result. +type PlanningError struct { + Message string `json:"message"` + Type string `json:"type"` + Code int `json:"code"` + Stack []string `json:"stack,omitempty"` +} + +// ScanTasks carries the task payload shared by completed planning responses and +// fetchScanTasks responses. Task/delete payload decoding lands with the +// scan-task decoder PR. +type ScanTasks struct { + PlanTasks []string `json:"plan-tasks,omitempty"` + FileScanTasks json.RawMessage `json:"file-scan-tasks,omitempty"` Review Comment: `PlanTasks` is typed as `[]string` which is great, but `FileScanTasks`/`DeleteFiles` as `json.RawMessage` on an exported type means moving to `[]FileScanTask`/`[]DeleteFile` later is a breaking change — and the CDC layer in transferia/iceberg could take a dependency on the `RawMessage` shape in the meantime. I'd define skeleton `FileScanTask`/`DeleteFile` structs now (even mostly-empty, marked incomplete) so the slice element types are committed up front and the decoder PR just fills them in. That keeps the breaking surface inside this proposal rather than across the follow-ups. ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// FetchPlanningResult polls a previously submitted plan. +func (r *Catalog) FetchPlanningResult(ctx context.Context, ident table.Identifier, planID string) (FetchPlanningResultResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// CancelPlanning cancels a server-side plan. Callers should cancel on context +// cancellation using a detached context with a short timeout. +func (r *Catalog) CancelPlanning(ctx context.Context, ident table.Identifier, planID string) error { + panic("unimplemented: proposed API for #1178") +} + +// FetchScanTasks fetches the scan tasks for a plan-task handle returned by a +// completed plan. +func (r *Catalog) FetchScanTasks(ctx context.Context, ident table.Identifier, req FetchScanTasksRequest) (FetchScanTasksResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// WaitForPlan polls a submitted plan to completion using jittered backoff, +// cancelling the server-side plan if the context is cancelled. It returns an +// error if the plan is still submitted after the wait, cancelled, failed, or +// expired. +func (r *Catalog) WaitForPlan(ctx context.Context, ident table.Identifier, planID string, opts WaitForPlanOptions) (CompletedPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Wire types (sketch) ---------------------------------------------------- +// +// Content-file, delete-file, and residual decoding lands with the scan-task +// decoder PR; these sketch the request/response envelopes so the client +// surface compiles and reads. + +// PlanStatus is the status of a server-side plan. +type PlanStatus string + +const ( + PlanStatusCompleted PlanStatus = "completed" + PlanStatusSubmitted PlanStatus = "submitted" + // PlanStatusCancelled is valid when polling a submitted plan, but invalid + // as a planTableScan response. PlanTableScan and WaitForPlan should treat a + // cancelled initial planning response as an error. + PlanStatusCancelled PlanStatus = "cancelled" + PlanStatusFailed PlanStatus = "failed" +) + +// PlanningError is the ErrorModel payload carried by a failed planning result. +type PlanningError struct { + Message string `json:"message"` + Type string `json:"type"` + Code int `json:"code"` + Stack []string `json:"stack,omitempty"` +} + +// ScanTasks carries the task payload shared by completed planning responses and +// fetchScanTasks responses. Task/delete payload decoding lands with the +// scan-task decoder PR. +type ScanTasks struct { + PlanTasks []string `json:"plan-tasks,omitempty"` + FileScanTasks json.RawMessage `json:"file-scan-tasks,omitempty"` + DeleteFiles json.RawMessage `json:"delete-files,omitempty"` +} + +// CompletedPlanningResult is the completed arm of the planning-result union. +// PlanID is required by the initial planTableScan response's +// CompletedPlanningWithIDResult arm and omitted by fetchPlanningResult. +type CompletedPlanningResult struct { + Status PlanStatus `json:"status"` + PlanID *string `json:"plan-id,omitempty"` + ScanTasks + StorageCredentials []StorageCredential `json:"storage-credentials,omitempty"` +} + +// PlanTableScanRequest is the POST .../plan request body. Filter is the +// ExpressionParser-format JSON produced by iceberg.MarshalExpressionJSON. +type PlanTableScanRequest struct { + SnapshotID *int64 `json:"snapshot-id,omitempty"` + StartSnapshotID *int64 `json:"start-snapshot-id,omitempty"` + EndSnapshotID *int64 `json:"end-snapshot-id,omitempty"` + Select []string `json:"select,omitempty"` + Filter json.RawMessage `json:"filter,omitempty"` + MinRowsRequested *int64 `json:"min-rows-requested,omitempty"` + CaseSensitive *bool `json:"case-sensitive,omitempty"` + UseSnapshotSchema *bool `json:"use-snapshot-schema,omitempty"` + StatsFields []string `json:"stats-fields,omitempty"` +} + +// PlanTableScanResponse is the POST .../plan response. The spec models this as +// a `status`-discriminated union; the flat struct carries every arm's fields +// with omitempty so none are discarded. Task/delete RawMessage payloads are +// decoded by the scan-task decoder PR. PlanID is required for submitted and +// completed responses from planTableScan; implementations must reject a missing +// PlanID for either status. A cancelled status is invalid here and must be +// treated as an error. +type PlanTableScanResponse struct { + Status PlanStatus `json:"status"` + PlanID *string `json:"plan-id,omitempty"` Review Comment: For the `completed` arm of `planTableScan`, the spec's `CompletedPlanningWithIDResult` requires `plan-id` — but modeling it as `*string` omitempty means a malformed server response silently yields `nil`, and the caller needs that id to `CancelPlanning` and release server resources. Since it's the flat-union approach, I'd validate `PlanID != nil` during unmarshal when `status == completed` rather than relying on the pointer. Worth a doc line too that the same response admits `cancelled`, which the spec calls invalid for this endpoint — callers have no way to tell an invalid status from a real cancellation otherwise. ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// FetchPlanningResult polls a previously submitted plan. +func (r *Catalog) FetchPlanningResult(ctx context.Context, ident table.Identifier, planID string) (FetchPlanningResultResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// CancelPlanning cancels a server-side plan. Callers should cancel on context +// cancellation using a detached context with a short timeout. +func (r *Catalog) CancelPlanning(ctx context.Context, ident table.Identifier, planID string) error { + panic("unimplemented: proposed API for #1178") +} + +// FetchScanTasks fetches the scan tasks for a plan-task handle returned by a +// completed plan. +func (r *Catalog) FetchScanTasks(ctx context.Context, ident table.Identifier, req FetchScanTasksRequest) (FetchScanTasksResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// WaitForPlan polls a submitted plan to completion using jittered backoff, +// cancelling the server-side plan if the context is cancelled. It returns an +// error if the plan is still submitted after the wait, cancelled, failed, or +// expired. +func (r *Catalog) WaitForPlan(ctx context.Context, ident table.Identifier, planID string, opts WaitForPlanOptions) (CompletedPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Wire types (sketch) ---------------------------------------------------- +// +// Content-file, delete-file, and residual decoding lands with the scan-task +// decoder PR; these sketch the request/response envelopes so the client +// surface compiles and reads. + +// PlanStatus is the status of a server-side plan. +type PlanStatus string + +const ( + PlanStatusCompleted PlanStatus = "completed" + PlanStatusSubmitted PlanStatus = "submitted" + // PlanStatusCancelled is valid when polling a submitted plan, but invalid + // as a planTableScan response. PlanTableScan and WaitForPlan should treat a + // cancelled initial planning response as an error. + PlanStatusCancelled PlanStatus = "cancelled" + PlanStatusFailed PlanStatus = "failed" +) + +// PlanningError is the ErrorModel payload carried by a failed planning result. +type PlanningError struct { Review Comment: The wire nesting looks right — the `error` key matches `IcebergErrorResponse` — but `PlanningError` is a fresh ad-hoc type when the codebase already has `ErrorModel`/`errorResponse` infra for exactly this `{message, type, code}` shape. I'd reuse the existing error type, or if it can't be reused cleanly, add a comment noting `PlanningError` corresponds 1:1 to `ErrorModel` so the next person doesn't wonder why there are two. ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// FetchPlanningResult polls a previously submitted plan. +func (r *Catalog) FetchPlanningResult(ctx context.Context, ident table.Identifier, planID string) (FetchPlanningResultResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// CancelPlanning cancels a server-side plan. Callers should cancel on context +// cancellation using a detached context with a short timeout. +func (r *Catalog) CancelPlanning(ctx context.Context, ident table.Identifier, planID string) error { + panic("unimplemented: proposed API for #1178") +} + +// FetchScanTasks fetches the scan tasks for a plan-task handle returned by a +// completed plan. +func (r *Catalog) FetchScanTasks(ctx context.Context, ident table.Identifier, req FetchScanTasksRequest) (FetchScanTasksResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// WaitForPlan polls a submitted plan to completion using jittered backoff, +// cancelling the server-side plan if the context is cancelled. It returns an +// error if the plan is still submitted after the wait, cancelled, failed, or +// expired. +func (r *Catalog) WaitForPlan(ctx context.Context, ident table.Identifier, planID string, opts WaitForPlanOptions) (CompletedPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Wire types (sketch) ---------------------------------------------------- +// +// Content-file, delete-file, and residual decoding lands with the scan-task +// decoder PR; these sketch the request/response envelopes so the client +// surface compiles and reads. + +// PlanStatus is the status of a server-side plan. +type PlanStatus string + +const ( + PlanStatusCompleted PlanStatus = "completed" + PlanStatusSubmitted PlanStatus = "submitted" + // PlanStatusCancelled is valid when polling a submitted plan, but invalid + // as a planTableScan response. PlanTableScan and WaitForPlan should treat a + // cancelled initial planning response as an error. + PlanStatusCancelled PlanStatus = "cancelled" + PlanStatusFailed PlanStatus = "failed" +) + +// PlanningError is the ErrorModel payload carried by a failed planning result. +type PlanningError struct { + Message string `json:"message"` + Type string `json:"type"` + Code int `json:"code"` + Stack []string `json:"stack,omitempty"` +} + +// ScanTasks carries the task payload shared by completed planning responses and +// fetchScanTasks responses. Task/delete payload decoding lands with the +// scan-task decoder PR. +type ScanTasks struct { + PlanTasks []string `json:"plan-tasks,omitempty"` + FileScanTasks json.RawMessage `json:"file-scan-tasks,omitempty"` + DeleteFiles json.RawMessage `json:"delete-files,omitempty"` +} + +// CompletedPlanningResult is the completed arm of the planning-result union. +// PlanID is required by the initial planTableScan response's +// CompletedPlanningWithIDResult arm and omitted by fetchPlanningResult. +type CompletedPlanningResult struct { + Status PlanStatus `json:"status"` + PlanID *string `json:"plan-id,omitempty"` + ScanTasks + StorageCredentials []StorageCredential `json:"storage-credentials,omitempty"` +} + +// PlanTableScanRequest is the POST .../plan request body. Filter is the +// ExpressionParser-format JSON produced by iceberg.MarshalExpressionJSON. +type PlanTableScanRequest struct { Review Comment: Two `planTableScan` headers don't have a home in the proposed surface: `Idempotency-Key` (optional UUID for safe retry — without it a `WaitForPlan` retry can kick off two planning jobs) and `X-Iceberg-Access-Delegation` (the `data-access` param that gates whether vended credentials come back, so `StorageCredentials` stays empty without it). Both feel like they belong on the request type now rather than threaded in later. I'd add `IdempotencyKey *string` to `PlanTableScanRequest` and some way to pass the access-delegation value (request field, option, or context). Do you have a preferred place to put the delegation header? ########## catalog/rest/scan_planning.go: ########## @@ -0,0 +1,211 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +// This file is a PROPOSED public API surface for REST server-side scan +// planning (apache/iceberg-go#1178). The bodies are intentionally +// unimplemented; the file exists so the REST surface can be reviewed as Go. +// Endpoint capability discovery (Endpoint, SupportsEndpoint) lands separately +// in the Phase 0 PR and is intentionally not redeclared here. + +package rest + +import ( + "context" + "encoding/json" + "fmt" + "time" + + "github.com/apache/iceberg-go/table" +) + +// Compile-time proof that the REST catalog satisfies the table planner seam. +var _ table.ScanPlanner = (*Catalog)(nil) + +// ErrPlanExpired is returned when polling a plan-id the server no longer knows +// about (HTTP 404 while polling), distinct from a table-not-found 404. +var ErrPlanExpired = fmt.Errorf("%w: scan plan expired", ErrRESTError) + +// --- Capability gating (Open Question 2) ------------------------------------ +// +// A single capability check is too coarse: requiring all four endpoints falls +// back to local against sync-only servers, while requiring only the plan +// endpoint false-positives, because planTableScan can return `submitted` or +// `plan-tasks` that need the poll/fetch endpoints. The split below lets `auto` +// use a sync-only server while reserving the async/fanout path for servers +// that advertise everything. + +// SupportsPlanTableScan reports whether the server advertised the synchronous +// plan endpoint. +func (r *Catalog) SupportsPlanTableScan() bool { + panic("unimplemented: proposed API for #1178") +} + +// SupportsFullRemoteScanPlanning reports whether the server advertised all four +// scan-planning endpoints (plan, fetch-result, cancel, fetch-tasks). +func (r *Catalog) SupportsFullRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// --- table.ScanPlanner implementation --------------------------------------- + +// SupportsRemoteScanPlanning reports whether this catalog can complete a remote +// plan end-to-end; backed by the split capability checks above. +func (r *Catalog) SupportsRemoteScanPlanning() bool { + panic("unimplemented: proposed API for #1178") +} + +// PlanFiles plans a scan server-side and returns tasks (and, optionally, a +// plan-scoped FileIO) for the table to read. +func (r *Catalog) PlanFiles(ctx context.Context, req table.ScanPlanningRequest) (table.ScanPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Low-level client methods ----------------------------------------------- + +// PlanTableScan submits a scan plan. The result is either completed inline, +// submitted (returns a plan-id to poll), or failed. +func (r *Catalog) PlanTableScan(ctx context.Context, ident table.Identifier, req PlanTableScanRequest) (PlanTableScanResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// FetchPlanningResult polls a previously submitted plan. +func (r *Catalog) FetchPlanningResult(ctx context.Context, ident table.Identifier, planID string) (FetchPlanningResultResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// CancelPlanning cancels a server-side plan. Callers should cancel on context +// cancellation using a detached context with a short timeout. +func (r *Catalog) CancelPlanning(ctx context.Context, ident table.Identifier, planID string) error { + panic("unimplemented: proposed API for #1178") +} + +// FetchScanTasks fetches the scan tasks for a plan-task handle returned by a +// completed plan. +func (r *Catalog) FetchScanTasks(ctx context.Context, ident table.Identifier, req FetchScanTasksRequest) (FetchScanTasksResponse, error) { + panic("unimplemented: proposed API for #1178") +} + +// WaitForPlan polls a submitted plan to completion using jittered backoff, +// cancelling the server-side plan if the context is cancelled. It returns an +// error if the plan is still submitted after the wait, cancelled, failed, or +// expired. +func (r *Catalog) WaitForPlan(ctx context.Context, ident table.Identifier, planID string, opts WaitForPlanOptions) (CompletedPlanningResult, error) { + panic("unimplemented: proposed API for #1178") +} + +// --- Wire types (sketch) ---------------------------------------------------- +// +// Content-file, delete-file, and residual decoding lands with the scan-task +// decoder PR; these sketch the request/response envelopes so the client +// surface compiles and reads. + +// PlanStatus is the status of a server-side plan. +type PlanStatus string + +const ( + PlanStatusCompleted PlanStatus = "completed" + PlanStatusSubmitted PlanStatus = "submitted" + // PlanStatusCancelled is valid when polling a submitted plan, but invalid + // as a planTableScan response. PlanTableScan and WaitForPlan should treat a + // cancelled initial planning response as an error. + PlanStatusCancelled PlanStatus = "cancelled" + PlanStatusFailed PlanStatus = "failed" +) + +// PlanningError is the ErrorModel payload carried by a failed planning result. +type PlanningError struct { + Message string `json:"message"` + Type string `json:"type"` + Code int `json:"code"` + Stack []string `json:"stack,omitempty"` +} + +// ScanTasks carries the task payload shared by completed planning responses and +// fetchScanTasks responses. Task/delete payload decoding lands with the +// scan-task decoder PR. +type ScanTasks struct { + PlanTasks []string `json:"plan-tasks,omitempty"` + FileScanTasks json.RawMessage `json:"file-scan-tasks,omitempty"` + DeleteFiles json.RawMessage `json:"delete-files,omitempty"` +} + +// CompletedPlanningResult is the completed arm of the planning-result union. +// PlanID is required by the initial planTableScan response's +// CompletedPlanningWithIDResult arm and omitted by fetchPlanningResult. +type CompletedPlanningResult struct { + Status PlanStatus `json:"status"` + PlanID *string `json:"plan-id,omitempty"` + ScanTasks + StorageCredentials []StorageCredential `json:"storage-credentials,omitempty"` +} + +// PlanTableScanRequest is the POST .../plan request body. Filter is the +// ExpressionParser-format JSON produced by iceberg.MarshalExpressionJSON. +type PlanTableScanRequest struct { + SnapshotID *int64 `json:"snapshot-id,omitempty"` + StartSnapshotID *int64 `json:"start-snapshot-id,omitempty"` Review Comment: The wire request carries `StartSnapshotID`/`EndSnapshotID` for incremental scans, but the high-level `ScanPlanningRequest` only has `SnapshotID` — so the seam advertises a capability it can't actually invoke, which reads like a gap rather than a deliberate deferral. Either add `Start`/`End` to `ScanPlanningRequest` with a "not yet wired" comment, or drop them from the wire type with a "lands with the incremental phase" note. The doc already says incremental is deferred, so I'd lean toward making the wire type match that and adding them together later — but either is fine as long as the two sides agree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
