Zhuo Peng created ARROW-6775: -------------------------------- Summary: Proposal for several Array utility functions Key: ARROW-6775 URL: https://issues.apache.org/jira/browse/ARROW-6775 Project: Apache Arrow Issue Type: Wish Reporter: Zhuo Peng
Hi, We developed several utilities that computes / accesses certain properties of Arrays and wonder if they make sense to get them into the upstream (into both the C++ API and pyarrow) and assuming yes, where is the best place to put them? Maybe I have overlooked existing APIs that already do the same.. in that case please point out. 1/ ListLengthFromListArray(ListArray&) Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists). For example: [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned array can be converted to numpy) 2/ GetBinaryArrayTotalByteSize(BinaryArray&) Returns the total byte size of a BinaryArray (basically offset[len - 1] - offset[0]). Alternatively, a BinaryArray::Flatten() -> Uint8Array would work. 3/ GetArrayNullBitmapAsByteArray(Array&) Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool numpy array) 4/ GetFlattenedArrayParentIndices(ListArray&) Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j means i-th element in the flattened ListArray came from j-th list in the ListArray. For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3] -- This message was sent by Atlassian Jira (v8.3.4#803005)