[GitHub] [arrow] westonpace commented on a change in pull request #11844: ARROW-14972: [Python][Doc] Document automatic partitioning discovery

GitBox Tue, 07 Dec 2021 07:04:49 -0800


westonpace commented on a change in pull request #11844:
URL: https://github.com/apache/arrow/pull/11844#discussion_r764083271




##########
File path: docs/source/python/dataset.rst
##########
@@ -340,6 +340,30 @@ when constructing a directory partitioning:
 Directory partitioning also supports providing a full schema rather than 
inferring
 types from file paths.
 
+Automatic partitioning detection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If the directory is partitioned using the hive partitioning scheme (see above)
+then pyarrow will be able to automatically recognize the partitioning and 
include
+the partitioning information as a column in the returned table.  There is no
+need to specify the partitioning unless you need to override the inferred data
+types of the partitioning columns:
+
+.. code-block:: python
+
+    dataset = ds.dataset("hive_partitioned", format="parquet")

Review comment:
       I think "read with hive" is functionally an "auto" option.  If the write 
was directory partitioned it is harmless and no partitions are found.  If the 
write was hive partitioned it will detect it.
   
   So I think we can change read without changing the write.  The only negative 
case will be the case where someone is using directory partitioning and their 
partition values actually have `=` inside of them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on a change in pull request #11844: ARROW-14972: [Python][Doc] Document automatic partitioning discovery

Reply via email to