Scott Taylor created ARROW-5004:
-----------------------------------
Summary: Confusing behaviour with boolean partition keys
Key: ARROW-5004
URL: https://issues.apache.org/jira/browse/ARROW-5004
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.12.1
Reporter: Scott Taylor
[https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L686]
Here the type of a partition key is converted to match the type of a filter
variable.
using the *write_to_dataset* function allows *boolean* partition keys (*True*
or *False)* but these silently break at the linked line as *bool('False')*
evaluates as *True*.
I understand a docstring
([https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L653)]
refers to only string or int partition variables being supported although this
is somewhat buried away from the user facing API.
It may be beneficial to detect the boolean case and raise a warning or to
ensure the function returns a more intuitive output when partition key is
*'False'* and the filter variable is *False.*
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)