Blaž Zupančič created ARROW-17058:
-------------------------------------

             Summary: Timezone aware parquet read with schema and filters
                 Key: ARROW-17058
                 URL: https://issues.apache.org/jira/browse/ARROW-17058
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, Python
    Affects Versions: 8.0.0
            Reporter: Blaž Zupančič
         Attachments: output.txt, pyarrow_bug.py, spark-3.1.parquet, 
spark-3.2.parquet, spark_parquet.py

The parquet.read_table() method in pyarrow 8.0.0 added `schema` parameter which 
is great for handling timestamps, i.e., they are correctly converted from UTC 
to the timezone specified in the schema.

However, when `schema` is used together with `filters`, timezone conversion 
fails with "Cannot compare timestamp with timezone to timestamp without 
timezone" error. This was tested on 2 files created with different versions of 
spark. The test code, files and the output are attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to