[GitHub] [arrow] wjones127 edited a comment on issue #12416: Parquet Partition issues with Int64 Null

GitBox Mon, 14 Feb 2022 13:22:42 -0800


wjones127 edited a comment on issue #12416:
URL: https://github.com/apache/arrow/issues/12416#issuecomment-1039292700



   Hi, this problem here likely isn't the partitioning read, but the conversion 
to pandas. From [the 
docs](https://arrow.apache.org/docs/python/pandas.html#nullable-types):
   
   > In Arrow all data types are nullable, meaning they support storing missing 
values. In pandas, however, not all data types have support for missing data. 
Most notably, the default integer data types do not, and will get casted to 
float when missing values are introduced. Therefore, when an Arrow array or 
table gets converted to pandas, integer columns will become float when missing 
values are present:
   
   There is a workaround using `type_mapper` in [that section of the 
docs](https://arrow.apache.org/docs/python/pandas.html#nullable-types) for 
Int64 specifically, so probably worth reading.
   
   If you do find there is an issue with the inferred partitioning schema, you 
can manually pass the partitioning schema. This should be available in version 
4.0.1: 
https://arrow.apache.org/docs/4.0/python/dataset.html#different-partitioning-schemes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wjones127 edited a comment on issue #12416: Parquet Partition issues with Int64 Null

Reply via email to