Weston Pace created ARROW-15407:
-----------------------------------

             Summary: [Python] Change the default write partitioning flavor to 
hive
                 Key: ARROW-15407
                 URL: https://issues.apache.org/jira/browse/ARROW-15407
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Weston Pace


Hive partitioning round trips smoothly as it doesn't require the reader to 
specify the column names on read like they have to do with directory 
partitioning.  We already default to hive in some places (e.g. 
parquet.write_to_dataset) but we do not do so on dataset.write_dataset.

To alleviate backwards compatibility issues Joris suggested a deprecation cycle.

First stage:

  * If a partitioning is specified and it is not a list of columns then do 
nothing.
  * If a partitioning is specified and it is a list of columns but the user has 
explicitly set partitioning_flavor then do nothing.
  * If a partitioning is specified and it is a list of columns and the user has 
not explicitly set partitioning_flavor then default to directory and emit a 
warning:

"The default partitioning_flavor will be changing from 'directory' to 'hive' in 
future releases.  To silence this warning please explicitly set a the 
partitioning_flavor"

Second stage:
  * If a partitioning is specified and it is not a list of columns then do 
nothing. (same as before)
  * If a partitioning is specified and it is a list of columns but the user has 
explicitly set partitioning_flavor then do nothing. (same as before)
  * If a partitioning is specified and it is a list of columns and the user has 
not explicitly set partitioning_flavor then default to hive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to