[GitHub] [arrow] NarayanB opened a new issue #12416: Parquet Partition issues with Int64 Null

GitBox Sun, 13 Feb 2022 15:04:28 -0800


NarayanB opened a new issue #12416:
URL: https://github.com/apache/arrow/issues/12416



   I'm not sure if no-one has observed this. There are multiple libraries 
libraries out there like pandas, vaex etc. They all can generate arrow table 
from their dataframe.  From there you can chose to code like 
pq.write_to_dataset(parquet_folder, partition_columns..). The issue is that for 
Int64, though the pyarrow schema is confirmed before write as 'int64', during 
write parquet with partition, it is converted as float64, if atleast there is 
one row with null value. This also messes up the data with inaccurate values. 
The latest pandas supports 'Int64', so I'm not sure if there is any specific 
parameter in kwargs that i need to pass. Using partition is very common and I 
tried with pyarrow versions 4,5,6 though I have to use 4.0.1. Please advise


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] NarayanB opened a new issue #12416: Parquet Partition issues with Int64 Null

Reply via email to