Weston Pace created ARROW-15406:
-----------------------------------
Summary: [Python] Change the default read partitioning flavor to
hive
Key: ARROW-15406
URL: https://issues.apache.org/jira/browse/ARROW-15406
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Weston Pace
Currently the default for reading datasets is to do no partitioning. So given
the dataset:
/foo=1/part0.parquet
/foo=2/part0.parquet
it will not detect the "foo" partition. Changing the default to hive should be
harmless in most cases (the only way it could be a problem is if a user had x=y
in their directory name and it wasn't intended to be a partition).
This may put us at odds with the default partitioning for writes (I'm opening a
separate JIRA for that) but specifying "partitioning=hive" on a directory
partitioned dataset is no worse than specifying "partitioning=None" on a
directory partitioned dataset which is what we do today.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)