felipecrv commented on code in PR #35345:
URL: https://github.com/apache/arrow/pull/35345#discussion_r1388471452
##########
cpp/src/arrow/array/concatenate.cc:
##########
@@ -160,16 +166,69 @@ Status PutOffsets(const std::shared_ptr<Buffer>& src,
Offset first_offset, Offse
// Write offsets into dst, ensuring that the first offset written is
// first_offset
- auto adjustment = first_offset - src_begin[0];
+ auto displacement = first_offset - src_begin[0];
// NOTE: Concatenate can be called during IPC reads to append delta
dictionaries.
// Avoid UB on non-validated input by doing the addition in the unsigned
domain.
// (the result can later be validated using Array::ValidateFull)
- std::transform(src_begin, src_end, dst, [adjustment](Offset offset) {
- return SafeSignedAdd(offset, adjustment);
+ std::transform(src_begin, src_end, dst, [displacement](Offset offset) {
+ return SafeSignedAdd(offset, displacement);
});
return Status::OK();
}
+template <typename offset_type>
+void PutListViewOffsets(const Buffer& src, offset_type displacement,
offset_type* dst);
+
+// Concatenate buffers holding list-view offsets into a single buffer of
offsets
+//
+// value_ranges contains the relevant ranges of values in the child array
actually
+// referenced to by the views. Most commonly, these ranges will start from 0,
+// but when that is not the case, we need to adjust the displacement of
offsets.
+// The concatenated child array does not contain values from the beginning
+// if they are not referenced to by any view.
+template <typename offset_type>
+Status ConcatenateListViewOffsets(const BufferVector& buffers,
+ const std::vector<Range>& value_ranges,
+ MemoryPool* pool, std::shared_ptr<Buffer>*
out) {
+ const int64_t out_size_in_bytes = SumBufferSizesInBytes(buffers);
+ ARROW_ASSIGN_OR_RAISE(*out, AllocateBuffer(out_size_in_bytes, pool));
+ auto* out_data = (*out)->mutable_data_as<offset_type>();
+
+ int64_t num_child_values = 0;
+ int64_t elements_length = 0;
+ for (size_t i = 0; i < buffers.size(); ++i) {
+ const auto displacement =
+ static_cast<offset_type>(num_child_values - value_ranges[i].offset);
+ PutListViewOffsets(/*src=*/*buffers[i],
static_cast<offset_type>(displacement),
+ /*dst=*/out_data + elements_length);
+ elements_length += buffers[i]->size() / sizeof(offset_type);
Review Comment:
Oh, I think this is very personal, but I prefer code that doesn't mutate
base buffer pointers and instead always re-derives the target pointer from the
base pointer and an integer offset.
If you show me how you would leverage the returned pointer to simplify this
loop I can totally change it though. I can't see it by myself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]