[GitHub] [arrow] arjunsr1 opened a new issue, #13847: Arrow::Table.save(path_to_file.parquet) changes data types within Schema.

GitBox Wed, 10 Aug 2022 14:20:36 -0700


arjunsr1 opened a new issue, #13847:
URL: https://github.com/apache/arrow/issues/13847


   Hello @kou - I had previously written about an issue of timestamp [ms] 
turning into timestamp [s] when loading parquet file data into an arrow table. 
This time, I have a similar question, but this time, it has led to significant 
data alteration that is hard to recover.
   
   There is a query we are hitting from a third party, and giving us a CSV 
file. The CSV file is loaded by calling Arrow::Table.load(csv_file_path), and 
the schema contains lots of data. Among that data, two of the fields are listed 
in table.schema.to_s as 'date64 [ms]' which is a timestamp that includes the 
day and gives a precision of milliseconds, and one of them is listed as uint16. 
However, when we are calling Arrow::Table.save(new_file_path.parquet), it seems 
that the function changes the `date64 [ms]` types to `date32 [day]`, and the 
`uint16` to `uint8`. As you can imagine, this is stripping timestamp data and 
turning it into just a representation of the date. 
   
   I was wondering if there is functionality in the `Arrow::Table.save` method 
that would allow to specify what type we want to save certain columns as. Do 
you know if parquet supports this `date64 [ms]` type at all? If you could help 
with a workaround that would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] arjunsr1 opened a new issue, #13847: Arrow::Table.save(path_to_file.parquet) changes data types within Schema.

Reply via email to