haraldnh commented on issue #49368:
URL: https://github.com/apache/arrow/issues/49368#issuecomment-3944277483
It's also the other way around: It works up until 20.0.0, which is where the
fields are 'string'. From 21.0.0 and onwards, things are 'string_view' and I
have not been able to make things work.
I'm reading a deltalake, partitioned on a 'date' (formatted like
"2026-02-23") and 'time' (formatted "12:30") fields. I then produce a
partition-filter like:
```python
def _make_single_day_partition_filter(
floor: datetime, ceil: datetime, partition_scheme: str
) -> list[tuple[str, str, str]]:
pfilter: list[tuple[str, str, str]] = [
('date', '=', f'{floor.year}-{floor.month:02d}-{floor.day:02d}')
]
if partition_scheme in ('5min', 'hour'):
pfilter.append(('time', '>=',
f'{floor.hour:02d}-{floor.minute:02d}'))
pfilter.append(('time', '<', f'{ceil.hour:02d}-{ceil.minute:02d}'))
return pfilter
```
Then feed this through dataset = deltalake.to_pyarrow_dataset(pfilter), and
then data = dataset.to_table(columns). From the error, it seems to be the
partition-filter comparison that crashes, but it only fails in to_table().
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]