[ 
https://issues.apache.org/jira/browse/ARROW-13471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Li updated ARROW-13471:
------------------------------
    Description: 
When trying to roundtrip data with pandas.read_parquet, datetime64[ns] columns 
are not round-tripped correctly if the data is written with fastparquet and 
read in with pyarrow. The data appears to be read in correctly, but the dtypes 
are incorrect.

Note: This works correctly if the engine used to read and write is fastparquet.

I asked this on the fastparquet bug tracker and they said that it was a pyarrow 
bug.

xref [Broken compat between fastparquet(0.7.0) and pyarrow · Issue #650 · 
dask/fastparquet (github.com)|https://github.com/dask/fastparquet/issues/650]
{code:java}
import pandas as pd
s = pd.DataFrame({"a":pd.date_range("20130101", periods=3)})
s.dtypes
# datetime64[ns] 
s.to_parquet("test.parquet", engine="fastparquet")
pd.read_parquet("test.parquet", engine="pyarrow").dtypes 
# datetime64[ns, UTC]
{code}
 

  was:
When trying to roundtrip data with pandas.read_parquet, datetime64[ns] columns 
are not round-tripped correctly if the data is written with fastparquet and 
read in with pyarrow. The data appears to be read in correctly, but the dtypes 
are incorrect.

Note: This works correctly if the engine used to read and write is fastparquet.

I asked this on the fastparquet bug tracker and they said that it was a pyarrow 
bug.

xref [Broken compat between fastparquet(0.7.0) and pyarrow · Issue #650 · 
dask/fastparquet (github.com)|https://github.com/dask/fastparquet/issues/650]
{code:java}
import pandas as pd
s = pd.DataFrame({"a":pd.date_range("20130101", periods=3)})
s.dtypes # datetime64[ns] 
s.to_parquet("test.parquet", engine="fastparquet")
pd.read_parquet("test.parquet", engine="pyarrow").dtypes 
# datetime64[ns, UTC]
{code}
 


> [Python][Parquet]Pandas datetime columns not correctly roundtripping with 
> fastparquet(0.7.0) and pyarrow 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13471
>                 URL: https://issues.apache.org/jira/browse/ARROW-13471
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 4.0.1
>         Environment: pandas: 1.4.0.dev0+253.gedd5af779a.dirty
> pyarrow: 4.0.1
> fastparquet: 0.7.0
>            Reporter: Thomas Li
>            Priority: Major
>
> When trying to roundtrip data with pandas.read_parquet, datetime64[ns] 
> columns are not round-tripped correctly if the data is written with 
> fastparquet and read in with pyarrow. The data appears to be read in 
> correctly, but the dtypes are incorrect.
> Note: This works correctly if the engine used to read and write is 
> fastparquet.
> I asked this on the fastparquet bug tracker and they said that it was a 
> pyarrow bug.
> xref [Broken compat between fastparquet(0.7.0) and pyarrow · Issue #650 · 
> dask/fastparquet (github.com)|https://github.com/dask/fastparquet/issues/650]
> {code:java}
> import pandas as pd
> s = pd.DataFrame({"a":pd.date_range("20130101", periods=3)})
> s.dtypes
> # datetime64[ns] 
> s.to_parquet("test.parquet", engine="fastparquet")
> pd.read_parquet("test.parquet", engine="pyarrow").dtypes 
> # datetime64[ns, UTC]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to