Scott Taylor created ARROW-5004:
-----------------------------------

             Summary: Confusing behaviour with boolean partition keys
                 Key: ARROW-5004
                 URL: https://issues.apache.org/jira/browse/ARROW-5004
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.1
            Reporter: Scott Taylor


[https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L686]

Here the type of a partition key is converted to match the type of a filter 
variable.

using the *write_to_dataset* function allows *boolean* partition keys (*True* 
or *False)* but these silently break at the linked line as *bool('False')* 
evaluates as *True*.

I understand a docstring 
([https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L653)]
 refers to only string or int partition variables being supported although this 
is somewhat buried away from the user facing API.

It may be beneficial to detect the boolean case and raise a warning or to 
ensure the function returns a more intuitive output when partition key is 
*'False'* and the filter variable is *False.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to