jorisvandenbossche commented on code in PR #13989:
URL: https://github.com/apache/arrow/pull/13989#discussion_r956994361
##########
python/pyarrow/array.pxi:
##########
@@ -2478,6 +2478,55 @@ cdef class DictionaryArray(Array):
return self._indices
+ @staticmethod
+ def from_buffers(DataType type, int64_t length, buffers, dictionary,
+ int64_t null_count=-1, int64_t offset=0,
memory_pool=None):
+ """
+ Construct a DictionaryArray from buffers.
+
+ type : pyarrow.DataType
Review Comment:
```suggestion
Parameters
----------
type : pyarrow.DataType
```
##########
python/pyarrow/array.pxi:
##########
@@ -2478,6 +2478,55 @@ cdef class DictionaryArray(Array):
return self._indices
+ @staticmethod
+ def from_buffers(DataType type, int64_t length, buffers, dictionary,
+ int64_t null_count=-1, int64_t offset=0,
memory_pool=None):
+ """
+ Construct a DictionaryArray from buffers.
+
+ type : pyarrow.DataType
Review Comment:
And can you also add a "Returns" section?
##########
python/pyarrow/tests/test_array.py:
##########
@@ -725,6 +725,13 @@ def test_struct_array_from_chunked():
pa.StructArray.from_arrays([chunked_arr], ["foo"])
+def test_dictionary_from_buffers():
+ a = pa.array(["one", "two", "three", "two", "one"]).dictionary_encode()
+ b = pa.DictionaryArray.from_buffers(
+ a.type, len(a), a.indices.buffers(), a.dictionary)
+ assert a == b
+
Review Comment:
Maybe add a test with an offset as well?
##########
python/pyarrow/array.pxi:
##########
@@ -2478,6 +2478,55 @@ cdef class DictionaryArray(Array):
return self._indices
+ @staticmethod
+ def from_buffers(DataType type, int64_t length, buffers, dictionary,
+ int64_t null_count=-1, int64_t offset=0,
memory_pool=None):
+ """
+ Construct a DictionaryArray from buffers.
+
+ type : pyarrow.DataType
+ length : int
+ The number of values in the array.
+ buffers : List[Buffer]
+ The buffers backing this array.
+ dictionary : pyarrow.Array, ndarray or pandas.Series
+ The array of values referenced by the indices.
+ null_count : int, default -1
+ The number of null entries in the array. Negative value means that
+ the null count is not known.
+ offset : int, default 0
+ The array's logical offset (in values, not in bytes) from the
+ start of each buffer.
+ memory_pool : MemoryPool, default None
+ For memory allocations, if required, otherwise uses default pool.
+ """
+ cdef:
+ Array _dictionary
+ vector[shared_ptr[CBuffer]] c_buffers
+ shared_ptr[CDataType] c_type
+ shared_ptr[CArrayData] c_data
+ shared_ptr[CArray] c_result
+
+ for buf in buffers:
+ c_buffers.push_back(pyarrow_unwrap_buffer(buf))
+
+ if isinstance(dictionary, Array):
+ _dictionary = dictionary
+ else:
+ _dictionary = array(dictionary, memory_pool=memory_pool)
Review Comment:
Given that this is a "power-user" method anyway, I think it is maybe fine to
require an Array here for the dictionary (for the Buffers we also only accept
actual buffers, and that would avoid needing to add the `memory_pool` keyword,
and avoid having to test this, which isn't the case at the moment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]