[I] Inconsistencies between `RecordBatch` and `DataFrame` schemas cause `to_arrow_table` to fail [datafusion-python]

via GitHub Mon, 01 Dec 2025 10:18:06 -0800


nuno-faria opened a new issue, #1314:
URL: https://github.com/apache/datafusion-python/issues/1314


   **Describe the bug**
   
   When the nullability of a `RecordBatch` column does not match with the 
`DataFrame`'s schema, the conversion to a `pyarrow` table fails.
   
   **To Reproduce**
   
   ```py
   from datafusion import SessionContext
   
   ctx = SessionContext()
   ctx.sql("create table t_(a int not null)").collect()
   ctx.sql("insert into t_ values (1), (2), (3)").collect()
   ctx.sql(f"copy (select * from t_) to 't.parquet'").collect()
   ctx.register_parquet("t", path)
   pyarrow_table = ctx.sql("select max(a) as m from t").to_arrow_table()
   ```
   
   ```
   ...
   pyarrow.lib.ArrowInvalid: Schema at index 0 was different: 
   m: int32
   vs
   m: int32 not null
   ```
   
   **Expected behavior**
   Execute without crashing.
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Inconsistencies between `RecordBatch` and `DataFrame` schemas cause `to_arrow_table` to fail [datafusion-python]

Reply via email to