[I] Schema Evolution with StructType via update_schema() Fails [iceberg-python]

via GitHub Tue, 23 Sep 2025 14:47:06 -0700


mukul-mpac opened a new issue, #2511:
URL: https://github.com/apache/iceberg-python/issues/2511


   ### Apache Iceberg version
   
   0.10.0 (latest release)
   
   ### Please describe the bug 🐞
   
   Environment / Setup Details
   - PyIceberg version: latest
   - Catalog: AWS Glue
   - Dependencies: includes pyarrow
   - Python version: (please fill this in, e.g. 3.12.10)
   
   ⸻
   
   ### The Problem
   
   Given an existing table test with schema:
   ```python
   Schema(
       NestedField(1, "id", StringType(), required=True),
       NestedField(2, "name", StringType(), required=False),
       NestedField(3, "roll_number", IntegerType(), required=True),
   )
   ```
   
   I attempt to evolve the schema after table creation by adding a new column 
address of type `StructType`:
   ```python
   StructType(
       NestedField(4, "street", StringType(), required=False),
       NestedField(5, "city", StringType(), required=False),
       NestedField(6, "state", StringType(), required=False),
       NestedField(7, "zip", IntegerType(), required=False),
   )
   ```
   
   Using the `update_schema()` context manager and its `add_column(...)` method 
to add this `StructType` field results in a `BadRequestError`:
   ```sh
   pyiceberg.exceptions.BadRequestError: InvalidInputException: Cannot parse to 
an integer value: id: 5.0
   ```
   
   What should happen:
   - The new `StructType` field should be added without errors.
   - You should be able to evolve a schema to include nested/struct types via 
`update_schema()` just as you can at table creation.
   - I remember this working up till last Thursday (18th September 2025)
   
   
   What is actually happening:
   - Adding a `StructType` via `update_schema()` throws `InvalidInputException: 
Cannot parse to an integer value: id: 5.0.`
   - The error indicates something is trying to parse “5.0” (a float) as an 
integer, presumably where a field-id or column ID is expected to be an integer.
   
   ⸻
   
   ### Steps to Reproduce
        1.      Create a table with the original schema:
   ```python
   table_id = f"{test_db_name}.{test_table_name}"
   schema = Schema(
       NestedField(1, "id", StringType(), required=True),
       NestedField(2, "name", StringType(), required=False),
       NestedField(3, "roll_number", IntegerType(), required=True),
   )
   
   table = catalog.create_table(
       identifier=table_id,
       schema=schema,
   )
   ```
   
        2.      Load the table and attempt schema evolution:
   
   ```python
   table = catalog.load_table(table_id)
   with table.update_schema() as updater:
       updater.add_column(
           path="address",
           field_type=StructType(
               NestedField(4, "street", StringType(), required=False),
               NestedField(5, "city", StringType(), required=False),
               NestedField(6, "state", StringType(), required=False),
               NestedField(7, "zip", IntegerType(), required=False),
           ),
           required=False,
       )
   ```
   
        3.      Observe the error above.
   
   ⸻
   
   ### Additional Observations
        •       The error only occurs when using update_schema() / 
schema-evolution after the table has been created.
        •       Creating the table with the StructType already included does 
not cause this error.
        •       Also, if the StructType field already exists (from creation) 
and then you try to add a new integer column (simple type) using 
update_schema(), you encounter a similar error.
   
   ⸻
   
   Suggested Investigation / Possible Cause
        •       The error message Cannot parse to an integer value: id: 5.0 
suggests something is wrongly computing a field or column ID as a float (5.0 
instead of integer 5).
        •       Perhaps the incremental assignment of new field IDs in schema 
evolution is mishandled when adding nested/struct types.
        •       Possible bug in the serialization or metadata packaging step, 
or in how nested field IDs are validated / sent to the catalog (Glue/REST) 
interface.
   
   ⸻
   
   What I’d Need Help With
        •       Is this behavior expected (i.e. limitation) or a bug?
        •       If limitation, can it be documented? If bug, is there a 
workaround?
        •       What would be needed to fix this properly (perhaps change in 
how field IDs are generated/validated, ensure integer types, avoid float casts)?
   
   ⸻
   
   If you like, I can also paste in the catalog configuration, full traceback, 
or minimal reproducible script. Would you prefer I include those now?
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Schema Evolution with StructType via update_schema() Fails [iceberg-python]

Reply via email to