jorisvandenbossche commented on a change in pull request #9294: URL: https://github.com/apache/arrow/pull/9294#discussion_r573807661
########## File path: python/pyarrow/tests/parquet/test_dataset.py ########## @@ -206,7 +206,7 @@ def test_filters_equivalency(tempdir, use_legacy_dataset): dataset = pq.ParquetDataset( base_path, filesystem=fs, filters=[('integer', '=', 1), ('string', '!=', 'b'), - ('boolean', '==', True)], + ('boolean', '==', 'True')], Review comment: Ah, yes, that makes sense. In the old ParquetDataset code, this works because we simply always try to convert the value of the partition to the type of the value in the expression (although more logically would be the other way around, though ...). So the string "True" gets cast to a bool and then compared (which actually might also mean this never worked with False, as `bool("False") is True` ...) https://github.com/apache/arrow/blob/4086409e1b4cf4feac3b5c84060c69e6c7de898d/python/pyarrow/parquet.py#L951-L952 input value of the "expression" to the value in the partition dictionary. So even if the partitioning has a string "True", we will convert the `True` value to ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org