Weston Pace created ARROW-15407:
-----------------------------------
Summary: [Python] Change the default write partitioning flavor to
hive
Key: ARROW-15407
URL: https://issues.apache.org/jira/browse/ARROW-15407
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Weston Pace
Hive partitioning round trips smoothly as it doesn't require the reader to
specify the column names on read like they have to do with directory
partitioning. We already default to hive in some places (e.g.
parquet.write_to_dataset) but we do not do so on dataset.write_dataset.
To alleviate backwards compatibility issues Joris suggested a deprecation cycle.
First stage:
* If a partitioning is specified and it is not a list of columns then do
nothing.
* If a partitioning is specified and it is a list of columns but the user has
explicitly set partitioning_flavor then do nothing.
* If a partitioning is specified and it is a list of columns and the user has
not explicitly set partitioning_flavor then default to directory and emit a
warning:
"The default partitioning_flavor will be changing from 'directory' to 'hive' in
future releases. To silence this warning please explicitly set a the
partitioning_flavor"
Second stage:
* If a partitioning is specified and it is not a list of columns then do
nothing. (same as before)
* If a partitioning is specified and it is a list of columns but the user has
explicitly set partitioning_flavor then do nothing. (same as before)
* If a partitioning is specified and it is a list of columns and the user has
not explicitly set partitioning_flavor then default to hive.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)