jorisvandenbossche commented on code in PR #13894:
URL: https://github.com/apache/arrow/pull/13894#discussion_r946682571
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -116,14 +117,20 @@ Result<std::shared_ptr<typename
TypeTraits<TYPE>::ArrayType>> ListArrayFromArray
return Status::TypeError("List offsets must be ",
OffsetArrowType::type_name());
}
+ if (null_bitmap != NULLPTR && offsets.null_count() > 0) {
Review Comment:
```suggestion
if (null_bitmap != nullptr && offsets.null_count() > 0) {
```
(to be consistent in style)
##########
cpp/src/arrow/array/array_nested.h:
##########
@@ -127,11 +127,15 @@ class ARROW_EXPORT ListArray : public
BaseListArray<ListType> {
/// allocated because of null values
static Result<std::shared_ptr<ListArray>> FromArrays(
const Array& offsets, const Array& values,
- MemoryPool* pool = default_memory_pool());
+ MemoryPool* pool = default_memory_pool(),
+ int64_t null_count = -1,
+ std::shared_ptr<Buffer> null_bitmap = NULLPTR);
Review Comment:
Small nit: it seems that existing functions here that support passing the
bitmap typically have null_bitmap first and null_count second (also, they seem
to use `kUnknownNullCount` instead of -1 explicitly, I don't know if that
actually makes a difference for the compiler)
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -116,14 +117,20 @@ Result<std::shared_ptr<typename
TypeTraits<TYPE>::ArrayType>> ListArrayFromArray
return Status::TypeError("List offsets must be ",
OffsetArrowType::type_name());
}
+ if (null_bitmap != NULLPTR && offsets.null_count() > 0) {
+ return Status::Invalid(
+ "Ambiguous to specify both validity map and offsets with nulls");
+ }
+
std::shared_ptr<Buffer> offset_buf, validity_buf;
RETURN_NOT_OK(CleanListOffsets<TYPE>(offsets, pool, &offset_buf,
&validity_buf));
- BufferVector buffers = {validity_buf, offset_buf};
+ int64_t null_count_ = null_bitmap ? null_count : offsets.null_count();
- auto internal_data = ArrayData::Make(type, offsets.length() - 1,
std::move(buffers),
- offsets.null_count(), offsets.offset());
+ std::shared_ptr<arrow::ArrayData> internal_data = ArrayData::Make(
+ type, offsets.length() - 1,
+ BufferVector{null_bitmap ? std::move(null_bitmap) : validity_buf,
offset_buf},
Review Comment:
It might be more readable to move this if/else to where `buffers` gets
defined (which actually no longer is used right now)
##########
python/pyarrow/array.pxi:
##########
@@ -2812,6 +2805,22 @@ cdef dict _array_classes = {
}
+cdef inline shared_ptr[CBuffer] c_mask_from_obj(object mask, MemoryPool pool)
except *:
Review Comment:
Maybe use "inverted" in the name of the function, so that when reading the
code in `from_arrays`, you still have some notion about the fact that the mask
gets inverted
##########
python/pyarrow/array.pxi:
##########
@@ -1890,7 +1890,7 @@ cdef class ListArray(BaseListArray):
"""
@staticmethod
- def from_arrays(offsets, values, DataType type=None, MemoryPool pool=None):
+ def from_arrays(offsets, values, DataType type=None, MemoryPool pool=None,
mask=None):
Review Comment:
Can you also update the docstring for this?
##########
python/pyarrow/tests/test_array.py:
##########
@@ -3192,3 +3192,41 @@ def test_to_pandas_timezone():
arr = pa.chunked_array([arr])
s = arr.to_pandas()
assert s.dt.tz is not None
+
+
[email protected]("arr", (
+ [None, [0]],
+ [None, [0, None], [0]],
+ [[0], [1]],
+)
+)
+def test_list_array_from_arrays(arr):
Review Comment:
There is already an existing `test_list_from_arrays` test, can you maybe
move this one to just after the existing one? (to keep related tests close)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]