NarayanB opened a new issue #12416: URL: https://github.com/apache/arrow/issues/12416
I'm not sure if no-one has observed this. There are multiple libraries libraries out there like pandas, vaex etc. They all can generate arrow table from their dataframe. From there you can chose to code like pq.write_to_dataset(parquet_folder, partition_columns..). The issue is that for Int64, though the pyarrow schema is confirmed before write as 'int64', during write parquet with partition, it is converted as float64, if atleast there is one row with null value. This also messes up the data with inaccurate values. The latest pandas supports 'Int64', so I'm not sure if there is any specific parameter in kwargs that i need to pass. Using partition is very common and I tried with pyarrow versions 4,5,6 though I have to use 4.0.1. Please advise -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
