chenghuichen commented on PR #5974:
URL: https://github.com/apache/paimon/pull/5974#issuecomment-3131300512

   Hi team,
   
   This PR provides the initial implementation for schema handling in Python. 
In the process, I've identified two important follow-up items that require 
discussion and further work.
   
   **1. `Schema` vs. `TableSchema` in `rest_catalog`**
   
   There's a design conflict in how schemas are persisted. My understanding is 
that `paimon.schema.Schema` is the user-facing API, while 
`paimon.table.TableSchema` is the internal, persistable format.
   
   However, the `rest_catalog` currently persists the user-facing `Schema` 
object. This is problematic because `Schema` contains a non-serializable 
`pyarrow.schema` attribute. To avoid breaking existing tests, I have omitted 
this attribute in my implementation, but this is a temporary workaround.
   
   **Proposal:** The `rest_catalog` should be updated to use `TableSchema` for 
persistence to resolve this serialization issue and align with the intended 
design.
   
   **2. Incomplete PyArrow Schema Conversion**
   
   The conversion logic between Paimon's types and PyArrow's types is not yet 
complete. I have implemented support for atomic types, but **nested types 
(`ROW`, `MAP`, `ARRAY`) are not yet supported**.
   
   This means that code paths relying on 
`parse_data_fields_from_pyarrow_schema` will fail if they encounter a schema 
with nested structures.
   
   **Proposal:** We should create follow-up tasks to implement the missing 
nested type conversions. I'm happy to help, and we could divide this work among 
contributors.
   
   In summary, this PR is a foundational step. To make it complete, we need to 
coordinate on the `rest_catalog` refactoring and plan the completion of the 
type conversion logic.
   
   Thanks for your review and feedback


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to