Blaž Zupančič created ARROW-17058:
-------------------------------------
Summary: Timezone aware parquet read with schema and filters
Key: ARROW-17058
URL: https://issues.apache.org/jira/browse/ARROW-17058
Project: Apache Arrow
Issue Type: Bug
Components: Parquet, Python
Affects Versions: 8.0.0
Reporter: Blaž Zupančič
Attachments: output.txt, pyarrow_bug.py, spark-3.1.parquet,
spark-3.2.parquet, spark_parquet.py
The parquet.read_table() method in pyarrow 8.0.0 added `schema` parameter which
is great for handling timestamps, i.e., they are correctly converted from UTC
to the timezone specified in the schema.
However, when `schema` is used together with `filters`, timezone conversion
fails with "Cannot compare timestamp with timezone to timestamp without
timezone" error. This was tested on 2 files created with different versions of
spark. The test code, files and the output are attached.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)