Scott Taylor created ARROW-5004: ----------------------------------- Summary: Confusing behaviour with boolean partition keys Key: ARROW-5004 URL: https://issues.apache.org/jira/browse/ARROW-5004 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.12.1 Reporter: Scott Taylor
[https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L686] Here the type of a partition key is converted to match the type of a filter variable. using the *write_to_dataset* function allows *boolean* partition keys (*True* or *False)* but these silently break at the linked line as *bool('False')* evaluates as *True*. I understand a docstring ([https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L653)] refers to only string or int partition variables being supported although this is somewhat buried away from the user facing API. It may be beneficial to detect the boolean case and raise a warning or to ensure the function returns a more intuitive output when partition key is *'False'* and the filter variable is *False.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)