Florian Jetter created ARROW-6339: ------------------------------------- Summary: [Python][C++] Rowgroup statistics for pd.NaT array ill defined Key: ARROW-6339 URL: https://issues.apache.org/jira/browse/ARROW-6339 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.14.1 Reporter: Florian Jetter
When initialising an array with NaT only values the row group statistic is corrupt returning either random values or raises integer out of bound exceptions. {code:python} import io import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")}) buf = pa.BufferOutputStream() pq.write_table(pa.Table.from_pandas(df), buf, version="2.0") buf = io.BytesIO(buf.getvalue().to_pybytes()) parquet_file = pq.ParquetFile(buf) # Asserting behaviour is difficult since it is random and the state is ill defined. # After a few iterations an exception is raised. while True: parquet_file.metadata.row_group(0).column(0).statistics.max {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)