jorisvandenbossche commented on a change in pull request #9294:
URL: https://github.com/apache/arrow/pull/9294#discussion_r573807661



##########
File path: python/pyarrow/tests/parquet/test_dataset.py
##########
@@ -206,7 +206,7 @@ def test_filters_equivalency(tempdir, use_legacy_dataset):
     dataset = pq.ParquetDataset(
         base_path, filesystem=fs,
         filters=[('integer', '=', 1), ('string', '!=', 'b'),
-                 ('boolean', '==', True)],
+                 ('boolean', '==', 'True')],

Review comment:
       Ah, yes, that makes sense. 
   In the old ParquetDataset code, this works because we simply always try to 
convert the value of the partition to the type of the value in the expression 
(although more logically would be the other way around, though ...). So the 
string "True" gets cast to a bool and then compared (which actually might also 
mean this never worked with False, as `bool("False") is True` ...)
   
   
https://github.com/apache/arrow/blob/4086409e1b4cf4feac3b5c84060c69e6c7de898d/python/pyarrow/parquet.py#L951-L952
   
   input value of the "expression" to the value in the partition dictionary. So 
even if the partitioning has a string "True", we will convert the `True` value 
to 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to