Diego Argueta created ARROW-4967:
------------------------------------

             Summary: Object type and stats lost when using 96-bit timestamps
                 Key: ARROW-4967
                 URL: https://issues.apache.org/jira/browse/ARROW-4967
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.1
         Environment: PyArrow: 0.12.1
Python: 2.7.15, 3.7.2
Pandas: 0.24.2
            Reporter: Diego Argueta


Run the following code:

{code:python}
import datetime as dt
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

dataframe = pd.DataFrame({'foo': [dt.datetime.now()]})
table = pa.Table.from_pandas(dataframe, preserve_index=False)

pq.write_table(table, 'int64.parq')
pq.write_table(table, 'int96.parq', use_deprecated_int96_timestamps=True)
{code}

Examining the {{int64.parq}} file, we see that the column metadata includes an 
object type of {{TIMESTAMP_MICROS}} and also gives some stats. All is well.

{code}
file schema: schema 
--------------------------------------------------------------------------------
foo:         OPTIONAL INT64 O:TIMESTAMP_MICROS R:0 D:1

row group 1: RC:1 TS:76 OFFSET:4 
--------------------------------------------------------------------------------
foo:          INT64 SNAPPY ... ST:[min: 2019-12-31T23:59:59.999000, max: 
2019-12-31T23:59:59.999000, num_nulls: 0]
{code}


However, if we look at {{int96.parq}}, it appears that that metadata is lost. 
No object type, and no column stats.

{code}
file schema: schema 
--------------------------------------------------------------------------------
foo:         OPTIONAL INT96 R:0 D:1

row group 1: RC:1 TS:58 OFFSET:4 
--------------------------------------------------------------------------------
foo:          INT96 SNAPPY ... ST:[no stats for this column]
{code}

This is a bit confusing since the metadata for the exact same data can look 
differently depending on an unrelated flag being set or cleared.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to