felipecrv commented on code in PR #41757:
URL: https://github.com/apache/arrow/pull/41757#discussion_r1624702941
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -807,7 +807,7 @@ MapArray::MapArray(const std::shared_ptr<DataType>& type,
int64_t length,
Result<std::shared_ptr<Array>> MapArray::FromArraysInternal(
std::shared_ptr<DataType> type, const std::shared_ptr<Array>& offsets,
const std::shared_ptr<Array>& keys, const std::shared_ptr<Array>& items,
- MemoryPool* pool) {
+ MemoryPool* pool, const std::shared_ptr<Buffer>& null_bitmap) {
Review Comment:
It's uncommon to have the null_bitmap after the `pool`. I would suggest
changing this function signature to:
```cpp
Result<std::shared_ptr<Array>> MapArray::FromArraysInternal(
std::shared_ptr<DataType> type, std::shared_ptr<Buffer> null_bitmap,
const std::shared_ptr<Array>& offsets,
const std::shared_ptr<Array>& keys, const std::shared_ptr<Array>& items,
MemoryPool* pool) {
```
Then in the header you can declare this one and define the old signature as
to keep the old API working.
```
static Result<std::shared_ptr<Array>> FromArraysInternal(
std::shared_ptr<DataType> type, const std::shared_ptr<Array>& offsets,
const std::shared_ptr<Array>& keys, const std::shared_ptr<Array>&
items,
MemoryPool* pool) {
return FromArraysInternal(std::move(type), NULLPTR, offsets, keys,
items, pool);
}
```
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -827,6 +827,15 @@ Result<std::shared_ptr<Array>>
MapArray::FromArraysInternal(
return Status::Invalid("Map key and item arrays must be equal length");
}
+ if (null_bitmap != nullptr && offsets->null_count() > 0) {
Review Comment:
`offsets` could have `null_count() == -1` (`kUnknownNullCount`) meaning that
offsets might contain nulls that are not accounted for in this test.
The good news is all you have to do is replace all the `null_count()` tests
here with `MayHaveNulls()` which would also fix this issue:
https://github.com/apache/arrow/issues/38553
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -836,24 +845,32 @@ Result<std::shared_ptr<Array>>
MapArray::FromArraysInternal(
using OffsetArrayType = typename TypeTraits<OffsetArrowType>::ArrayType;
const auto& typed_offsets = checked_cast<const OffsetArrayType&>(*offsets);
- auto buffers = BufferVector({nullptr, typed_offsets.values()});
+
+ BufferVector buffers;
+ int64_t null_count;
+ if (null_bitmap != nullptr) {
+ buffers = BufferVector({std::move(null_bitmap), typed_offsets.values()});
+ null_count = null_bitmap->size();
Review Comment:
`null_bitmap` was moved-from, so it's invalid to call a method on it after
that. It all works fine because the parameter is `const shared_ptr&`, but it
should be just `shared_ptr<Buffer>` so you can `std::move` it meaningfully
here. This is present in the signature suggestion I've made above.
Another thing: is this logic correct? Is `null_count` the `::size()` of the
buffer?
##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -836,24 +845,32 @@ Result<std::shared_ptr<Array>>
MapArray::FromArraysInternal(
using OffsetArrayType = typename TypeTraits<OffsetArrowType>::ArrayType;
const auto& typed_offsets = checked_cast<const OffsetArrayType&>(*offsets);
- auto buffers = BufferVector({nullptr, typed_offsets.values()});
+
+ BufferVector buffers;
+ int64_t null_count;
+ if (null_bitmap != nullptr) {
+ buffers = BufferVector({std::move(null_bitmap), typed_offsets.values()});
+ null_count = null_bitmap->size();
+ } else {
+ buffers = BufferVector({null_bitmap, typed_offsets.values()});
Review Comment:
You can have `nullptr` in the place of `null_bitmap` here to make it more
clear
Overall suggestion:
```cpp
BufferVector buffers;
buffers.resize(2);
int64_t null_count = 0;
if (null_bitmap) {
null_count = ...calculate the pop-count of the buffer...
buffers[0] = std::move(null_bitmap);
}
buffers[1] = typed_offsets.values();
```
`resize(2)` initializes buffers with two `nullptr` buffer pointers, so you
only have to set when you have a `null_bitmap` and `buffers[1]` is always set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]