jorisvandenbossche commented on a change in pull request #9294:
URL: https://github.com/apache/arrow/pull/9294#discussion_r573807661
##########
File path: python/pyarrow/tests/parquet/test_dataset.py
##########
@@ -206,7 +206,7 @@ def test_filters_equivalency(tempdir, use_legacy_dataset):
dataset = pq.ParquetDataset(
base_path, filesystem=fs,
filters=[('integer', '=', 1), ('string', '!=', 'b'),
- ('boolean', '==', True)],
+ ('boolean', '==', 'True')],
Review comment:
Ah, yes, that makes sense.
In the old ParquetDataset code, this works because we simply always try to
convert the value of the partition to the type of the value in the expression
(although more logically would be the other way around, though ...). So the
string "True" gets cast to a bool and then compared (which actually might also
mean this never worked with False, as `bool("False") is True` ...)
https://github.com/apache/arrow/blob/4086409e1b4cf4feac3b5c84060c69e6c7de898d/python/pyarrow/parquet.py#L951-L952
input value of the "expression" to the value in the partition dictionary. So
even if the partitioning has a string "True", we will convert the `True` value
to
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]