geruh opened a new issue, #2775:
URL: https://github.com/apache/iceberg-python/issues/2775

   ### Feature Request / Improvement
   
   ## Summary
   
   Now that Java has added server-side scan planning support [PR 
#14480](https://github.com/apache/iceberg/pull/14480), I believe Python is a 
great place to integrate this functionality! We have all the building blocks 
and they are almost brought to completion. I'm creating this issue to track all 
of the tasks we need to drive it through
   
   ## Context
   
   We have some open PRs with some needed model changes, but we have pivoted to 
using our existing models and ensuring they're properly serializable with 
pydantic.
   
   For example, initially we can work on:
   
   1. Expression Serialization to ensure `BooleanExpression` and subclasses 
serialize correctly for REST API. Related to @Fokko's work on #2750 and #2435.
   
   2. DataFile Serialization to ensure we can properly deserialize the 
data/deletefiles from the server response. Open API uses kebab-case 
(`file-format`, `file-path`) but our models expect snake_case (`file_format`, 
`file_path`).
   
   
   ## REST API Endpoints to Implement
   
   Based on the [Iceberg REST 
spec](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml):
   
   1. `POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan` - Submit 
scan for planning
   2. `GET /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}` - 
Fetch planning result
   3. `DELETE 
/v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/{plan-id}` - Cancel 
planning
   4. `POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/tasks` - Fetch 
scan tasks for a plan task
   
   ## Tasks
   
   Initially we can start with core sync planning and as the server side 
supports all the other capabilities for testing we can support the other cases.
   
   ### Core Sync Planning 
   
   - [ ] Build on @Fokko's expression work to ensure `Expression` classes 
serialize properly with Pydantic.
   - [ ] Construct only plan Request/Response models (`PlanTableScanRequest` & 
   - [ ] Add support for REST scan to fetch scan tasks replicating `DataScan` 
behavior
   - [ ] Parse server response to `FileScanTask` objects (handle DataFile 
construction)
   - [ ] Add `plan_table_scan()` methods
   - [ ] Add configuration support (`scan.planning.mode`)
   - [ ] Add  documentation
   
   ###  Full Scan planning support (Follow-up)
   
   Complete the full scan planning API with async operations and pagination.
   
   - [ ] Add support for endpoint 2 & 3 to RESTScan
   - [ ] Add endpoint 4 support to `RestCatalog` 
   - [ ] Complete documentation with all scenarios
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to