Andy Douglas created ARROW-11388:
------------------------------------

             Summary: Dataset Timezone Handling
                 Key: ARROW-11388
                 URL: https://issues.apache.org/jira/browse/ARROW-11388
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 3.0.0, 2.0.0
            Reporter: Andy Douglas


I'm trying to write a pandas dataframe with a datetimeindex with timezone 
information to a pyarrow dataset but the timezone information doesn't seem to 
be written (apart from in the pandas metadata)

 

For example

 
{code:java}
import os
import pandas as pd
import numpy as np
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
from pathlib import Path
print(pa.__version__)
# create dummy dataframe with datetime index containing tz info
df = pd.DataFrame(
    dict(
        timestamp=pd.date_range("2021-01-01", freq="1T", periods=100, 
tz="US/Eastern"),
        x=np.arange(100),
     )
).set_index("timestamp")
test_dir = Path("test_dir")
table = pa.Table.from_pandas(df)
schema = table.schema
print(schema)
print(schema.pandas_metadata)
pq.write_to_dataset(table, test_dir)
print(pq.ParquetFile(test_dir / os.listdir(test_dir)[0]).read())
dataset = ds.dataset(test_dir, format="parquet", schema=schema)
dataset.to_table()
{code}
 

 

Is this a bug or am I missing something?


Thanks

Andy

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to