[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13989: ARROW-14495: [Python] Fix DictionaryArray.from_buffers, should not crash

GitBox Mon, 29 Aug 2022 00:46:39 -0700


jorisvandenbossche commented on code in PR #13989:
URL: https://github.com/apache/arrow/pull/13989#discussion_r956994361



##########
python/pyarrow/array.pxi:
##########
@@ -2478,6 +2478,55 @@ cdef class DictionaryArray(Array):
 
         return self._indices
 
+    @staticmethod
+    def from_buffers(DataType type, int64_t length, buffers, dictionary,
+                     int64_t null_count=-1, int64_t offset=0, 
memory_pool=None):
+        """
+        Construct a DictionaryArray from buffers.
+
+        type : pyarrow.DataType

Review Comment:
   ```suggestion
           Parameters
           ----------
           type : pyarrow.DataType
   ```



##########
python/pyarrow/array.pxi:
##########
@@ -2478,6 +2478,55 @@ cdef class DictionaryArray(Array):
 
         return self._indices
 
+    @staticmethod
+    def from_buffers(DataType type, int64_t length, buffers, dictionary,
+                     int64_t null_count=-1, int64_t offset=0, 
memory_pool=None):
+        """
+        Construct a DictionaryArray from buffers.
+
+        type : pyarrow.DataType

Review Comment:
   And can you also add a "Returns" section?



##########
python/pyarrow/tests/test_array.py:
##########
@@ -725,6 +725,13 @@ def test_struct_array_from_chunked():
         pa.StructArray.from_arrays([chunked_arr], ["foo"])
 
 
+def test_dictionary_from_buffers():
+    a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode()
+    b = pa.DictionaryArray.from_buffers(
+        a.type, len(a), a.indices.buffers(), a.dictionary)
+    assert a == b
+

Review Comment:
   Maybe add a test with an offset as well?



##########
python/pyarrow/array.pxi:
##########
@@ -2478,6 +2478,55 @@ cdef class DictionaryArray(Array):
 
         return self._indices
 
+    @staticmethod
+    def from_buffers(DataType type, int64_t length, buffers, dictionary,
+                     int64_t null_count=-1, int64_t offset=0, 
memory_pool=None):
+        """
+        Construct a DictionaryArray from buffers.
+
+        type : pyarrow.DataType
+        length : int
+            The number of values in the array.
+        buffers : List[Buffer]
+            The buffers backing this array.
+        dictionary : pyarrow.Array, ndarray or pandas.Series
+            The array of values referenced by the indices.
+        null_count : int, default -1
+            The number of null entries in the array. Negative value means that
+            the null count is not known.
+        offset : int, default 0
+            The array's logical offset (in values, not in bytes) from the
+            start of each buffer.
+        memory_pool : MemoryPool, default None
+            For memory allocations, if required, otherwise uses default pool.
+        """
+        cdef:
+            Array _dictionary
+            vector[shared_ptr[CBuffer]] c_buffers
+            shared_ptr[CDataType] c_type
+            shared_ptr[CArrayData] c_data
+            shared_ptr[CArray] c_result
+
+        for buf in buffers:
+            c_buffers.push_back(pyarrow_unwrap_buffer(buf))
+
+        if isinstance(dictionary, Array):
+            _dictionary = dictionary
+        else:
+            _dictionary = array(dictionary, memory_pool=memory_pool)

Review Comment:
   Given that this is a "power-user" method anyway, I think it is maybe fine to 
require an Array here for the dictionary (for the Buffers we also only accept 
actual buffers, and that would avoid needing to add the `memory_pool` keyword, 
and avoid having to test this, which isn't the case at the moment)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13989: ARROW-14495: [Python] Fix DictionaryArray.from_buffers, should not crash

Reply via email to