geruh opened a new issue, #2775: URL: https://github.com/apache/iceberg-python/issues/2775
### Feature Request / Improvement ## Summary Now that Java has added server-side scan planning support [PR #14480](https://github.com/apache/iceberg/pull/14480), I believe Python is a great place to integrate this functionality! We have all the building blocks and they are almost brought to completion. I'm creating this issue to track all of the tasks we need to drive it through ## Context We have some open PRs with some needed model changes, but we have pivoted to using our existing models and ensuring they're properly serializable with pydantic. For example, initially we can work on: 1. Expression Serialization to ensure `BooleanExpression` and subclasses serialize correctly for REST API. Related to @Fokko's work on #2750 and #2435. 2. DataFile Serialization to ensure we can properly deserialize the data/deletefiles from the server response. Open API uses kebab-case (`file-format`, `file-path`) but our models expect snake_case (`file_format`, `file_path`). ## REST API Endpoints to Implement Based on the [Iceberg REST spec](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml): 1. `POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan` - Submit scan for planning 2. `GET /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}` - Fetch planning result 3. `DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}` - Cancel planning 4. `POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/tasks` - Fetch scan tasks for a plan task ## Tasks Initially we can start with core sync planning and as the server side supports all the other capabilities for testing we can support the other cases. ### Core Sync Planning - [ ] Build on @Fokko's expression work to ensure `Expression` classes serialize properly with Pydantic. - [ ] Construct only plan Request/Response models (`PlanTableScanRequest` & - [ ] Add support for REST scan to fetch scan tasks replicating `DataScan` behavior - [ ] Parse server response to `FileScanTask` objects (handle DataFile construction) - [ ] Add `plan_table_scan()` methods - [ ] Add configuration support (`scan.planning.mode`) - [ ] Add documentation ### Full Scan planning support (Follow-up) Complete the full scan planning API with async operations and pagination. - [ ] Add support for endpoint 2 & 3 to RESTScan - [ ] Add endpoint 4 support to `RestCatalog` - [ ] Complete documentation with all scenarios -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
