[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9130: ARROW-10247: [C++][Dataset] Support writing datasets partitioned on dictionary columns

GitBox Fri, 15 Jan 2021 02:13:53 -0800


jorisvandenbossche commented on a change in pull request #9130:
URL: https://github.com/apache/arrow/pull/9130#discussion_r558189535




##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -1403,6 +1403,25 @@ cdef class PartitioningFactory(_Weakrefable):
         return self.wrapped
 
 
+cdef vector[shared_ptr[CArray]] _partitioning_dictionaries(
+        Schema schema, dictionaries) except *:
+    cdef:
+        vector[shared_ptr[CArray]] c_dictionaries
+
+    dictionaries = list(dictionaries or [])[:len(schema)]
+    while len(dictionaries) < len(schema):
+        dictionaries.append(None)
+
+    for field, dictionary in zip(schema, dictionaries):
+        if (isinstance(field.type, pa.DictionaryType) and
+                dictionary is not None):
+            c_dictionaries.push_back(pyarrow_unwrap_array(dictionary))
+        else:
+            c_dictionaries.push_back(<shared_ptr[CArray]> nullptr)

Review comment:
       The entry in `dictionaries` is still ignored when the field is not of 
dictionary type. Which is a user error of course, but we could maybe raise an 
exception in that case instead of silently ignoring it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9130: ARROW-10247: [C++][Dataset] Support writing datasets partitioned on dictionary columns

Reply via email to