kardaj opened a new issue, #37355:
URL: https://github.com/apache/arrow/issues/37355
### Describe the bug, including details regarding any error messages,
version, and platform.
From what I gathered, a timezone-aware `datetime.datetime` is cast into a
naive timestamp if its microseconds=0.
I managed to replicate the error in this snippet:
```python
import io
import pytz
import datetime
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.compute as pc
timezone = "Europe/Paris"
field_name = "timestamp"
table = pa.Table.from_pydict(
{field_name: []},
schema=pa.schema(
[
pa.field(
field_name,
pa.timestamp("ns", tz=timezone),
nullable=False,
)
]
),
)
print(table)
buffer = io.BytesIO()
pq.write_table(table, buffer)
filters = None
table = pq.read_table(buffer, filters=filters)
assert len(table.to_pylist()) == 0
print(f"filters={filters}", "ok")
for microsecond in [1, 0]:
timestamp = pytz.timezone(timezone).localize(
datetime.datetime.combine(
datetime.date.today(),
datetime.time(hour=12, microsecond=microsecond),
)
)
filters = pc.field("timestamp") <= timestamp
table = pq.read_table(buffer, filters=filters)
assert len(table.to_pylist()) == 0
print(f"filters={filters}", "ok")
```
with pyarrow<13.0.0, I get the following output:
```
pyarrow.Table
timestamp: timestamp[ns, tz=Europe/Paris] not null
----
timestamp: [[]]
filters=None ok
filters=(timestamp <= 2023-08-24 10:00:00.000001) ok
filters=(timestamp <= 2023-08-24 10:00:00.000000) ok
terminate called without an active exception
Aborted (core dumped)
```
with pyarrow==13.0.0, I get the following output:
```
pyarrow.Table
timestamp: timestamp[ns, tz=Europe/Paris] not null
----
timestamp: [[]]
filters=None ok
filters=(timestamp <= 2023-08-24 10:00:00.000001) ok
Traceback (most recent call last):
File "/workspaces/mapping-tools/broken_pyarrow.py", line 43, in <module>
table = pq.read_table(buffer, filters=filters)
File
"/workspaces/mapping-tools/env/lib/python3.9/site-packages/pyarrow/parquet/core.py",
line 3002, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
File
"/workspaces/mapping-tools/env/lib/python3.9/site-packages/pyarrow/parquet/core.py",
line 2630, in read
table = self._dataset.to_table(
File "pyarrow/_dataset.pyx", line 547, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 393, in pyarrow._dataset.Dataset.scanner
File "pyarrow/_dataset.pyx", line 3391, in
pyarrow._dataset.Scanner.from_dataset
File "pyarrow/_dataset.pyx", line 3309, in
pyarrow._dataset.Scanner._make_scan_options
File "pyarrow/_dataset.pyx", line 3243, in
pyarrow._dataset._populate_builder
File "pyarrow/_compute.pyx", line 2595, in pyarrow._compute._bind
File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'less_equal' has no kernel
matching input types (timestamp[ns, tz=Europe/Paris], timestamp[s])
```
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]