[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

GitBox Mon, 30 Aug 2021 12:20:18 -0700


jorisvandenbossche commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r698743892




##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1998,6 +1998,41 @@ cdef class PartitioningFactory(_Weakrefable):
     cdef inline shared_ptr[CPartitioningFactory] unwrap(self):
         return self.wrapped
 
+    @property
+    def type_name(self):
+        return frombytes(self.factory.type_name())
+
+    def create_with_schema(self, schema):

Review comment:
       > Maybe it would be simpler to just allow write_dataset to accept a 
"list of column names + partitioning format" or a partitioning object?
   
   I am personally +1 on this (that's what I also mentioned on Zulip, and then 
we don't need to "misuse" the factories to pass along the field names). 
   The main downside is that this requires another keyword to specify the type 
of partitioning, though, if we want to support hive-style this way (R solves 
that by having an extra keyword `hive_style`, 
https://arrow.apache.org/docs/r/reference/write_dataset.html, we could have a 
`partitioning_flavor="hive"`, although that might get a bit long). Not fully 
sure what I would prefer in the end.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

Reply via email to