arjunsr1 opened a new issue, #13847: URL: https://github.com/apache/arrow/issues/13847
Hello @kou - I had previously written about an issue of timestamp [ms] turning into timestamp [s] when loading parquet file data into an arrow table. This time, I have a similar question, but this time, it has led to significant data alteration that is hard to recover. There is a query we are hitting from a third party, and giving us a CSV file. The CSV file is loaded by calling Arrow::Table.load(csv_file_path), and the schema contains lots of data. Among that data, two of the fields are listed in table.schema.to_s as 'date64 [ms]' which is a timestamp that includes the day and gives a precision of milliseconds, and one of them is listed as uint16. However, when we are calling Arrow::Table.save(new_file_path.parquet), it seems that the function changes the `date64 [ms]` types to `date32 [day]`, and the `uint16` to `uint8`. As you can imagine, this is stripping timestamp data and turning it into just a representation of the date. I was wondering if there is functionality in the `Arrow::Table.save` method that would allow to specify what type we want to save certain columns as. Do you know if parquet supports this `date64 [ms]` type at all? If you could help with a workaround that would be great. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
