Re: [I] [Python] pyarrow 13.0.0 converted `datetime64[ns]` to `datetime64[us]` when using `pd.read_parquet` [arrow]

via GitHub Tue, 10 Oct 2023 05:59:03 -0700


seanslma commented on issue #38171:
URL: https://github.com/apache/arrow/issues/38171#issuecomment-1755379996


   Thanks. Apologies if I did not explain the issue clearly.
   
   I used "pandas_version": "2.1.0" - this can be found from the parquet bytes 
string.
   
   This is the code used to create the df and parquet bytes
   ```py
   t1 = '2023-09-01'
   ds = pd.date_range(t1, t1, freq='30T')
   df = pd.DataFrame({
       'ds': ds,
   })
   
   df_parquet_bytes_v12_ns = df.astype({'ds': 'datetime64[ns]'}).to_parquet() 
#using pyarrow 12.0.0
   df_parquet_bytes_v12_us = df.astype({'ds': 'datetime64[us]'}).to_parquet() 
#using pyarrow 12.0.0
   df_parquet_bytes_v13_ns = df.astype({'ds': 'datetime64[ns]'}).to_parquet() 
#using pyarrow 13.0.0
   df_parquet_bytes_v13_us = df.astype({'ds': 'datetime64[us]'}).to_parquet() 
#using pyarrow 13.0.0
   ```
   I created the parquet bytes in pyarrow 12.0.0 because our api uses pyarrow 
12.0.0. At the client side we use pyarrow 13.0.0 to convert the parquet bytes 
back to pandas df.
   
   For this one (**BUG**)
   ```
                     input       output_v12      output_v13   comment
   df_parquet_bytes_v12_ns:  datetime64[ns]  datetime64[us]   v13 ns -> us, 
lost resolution
   ``` 
   The input parquet bytes data is created using pyarrow 12.0.0 with 
`datetime64[ns]` in the input df.
   When convert bytes data back to pandas df the unit is still `datetime64[ns]` 
using pyarrow 12.0.0.
   But the unit becomes `datetime64[us]` when using pyarrow 13.0.0.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] pyarrow 13.0.0 converted `datetime64[ns]` to `datetime64[us]` when using `pd.read_parquet` [arrow]

Reply via email to