[GitHub] [arrow] amol- commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

GitBox Tue, 31 Aug 2021 07:48:58 -0700


amol- commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r699178589




##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1998,6 +1998,41 @@ cdef class PartitioningFactory(_Weakrefable):
     cdef inline shared_ptr[CPartitioningFactory] unwrap(self):
         return self.wrapped
 
+    @property
+    def type_name(self):
+        return frombytes(self.factory.type_name())
+
+    def create_with_schema(self, schema):

Review comment:
       I changed `write_dataset` to accept `partitioning + 
partitioning_flavor`, see 
   ```
   ds.write_dataset(table, tempdir, format='parquet',
                        partitioning=["b"], partitioning_flavor="hive")
   ```
   from the test
   
   So we are not using a factory anymore. I'll update the documentation if we 
are ok with this as the final api we want users to rely on.

##########
File path: python/pyarrow/tests/test_dataset.py
##########
@@ -1577,6 +1577,38 @@ def 
test_dictionary_partitioning_outer_nulls_raises(tempdir):
         ds.write_dataset(table, tempdir, format='parquet', partitioning=part)
 
 
+def test_write_dataset_with_field_names(tempdir):

Review comment:
       :+1: moved the tests

##########
File path: python/pyarrow/dataset.py
##########
@@ -788,7 +805,8 @@ def file_visitor(written_file):
     if max_partitions is None:
         max_partitions = 1024
 
-    partitioning = _ensure_write_partitioning(partitioning)
+    partitioning = _ensure_write_partitioning(partitioning, schema=schema,

Review comment:
       Should have addressed those cases and added tests for them




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

Reply via email to