westonpace commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r696898037
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1998,6 +1998,41 @@ cdef class PartitioningFactory(_Weakrefable):
cdef inline shared_ptr[CPartitioningFactory] unwrap(self):
return self.wrapped
+ @property
+ def type_name(self):
+ return frombytes(self.factory.type_name())
+
+ def create_with_schema(self, schema):
Review comment:
Well...in the C++ it is a multi-step method. The PartitioningFactory is
created, filenames are inspected, then it is finished. Thinking about this
more I am wondering if this is the correct approach. It seems very odd that a
partitioning factory should need to be used if you aren't actually inspecting
any files. The purpose of a partitioning factory to create a partitioning from
a set of filenames while creating a dataset from a list of filenames. So the
use case is...
Create partitioning factory
Run inspect on a datasetfactory
Dataset factory passes all filenames to partitioning factory (while also
keeping them to create the dataset)
Finish called on partitioning factory to generate partitioning (which is
then added to the created dataset)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]