[ 
https://issues.apache.org/jira/browse/ARROW-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230773#comment-17230773
 ] 

Ben Kietzman commented on ARROW-3388:
-------------------------------------

[~uwe] [~npr] [~jorisvandenbossche] Is this issue still desired? Partition 
columns are inferred with dictionary type by default, so in the case where the 
only keys present were "True" and "False" we'd infer a string dictionary with 
those two values. The int32 indices are not as efficient as a boolean array, 
but perhaps it is sufficient

> [C++][Dataset] Automatically detect boolean partition columns
> -------------------------------------------------------------
>
>                 Key: ARROW-3388
>                 URL: https://issues.apache.org/jira/browse/ARROW-3388
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe Korn
>            Priority: Major
>              Labels: dataset, dataset-parquet-read, parquet
>
> Saving a {{ParquetDataset}} using a boolean column as a partitioning column 
> will store {{True/False}} as the values in the path. On reload these columns 
> will then be string columns with the values {{'True'}} and {{'False'}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to