andishgar commented on code in PR #46229:
URL: https://github.com/apache/arrow/pull/46229#discussion_r2177301787
##########
cpp/src/arrow/array/array_binary.cc:
##########
@@ -104,7 +105,14 @@ BinaryViewArray::BinaryViewArray(std::shared_ptr<DataType>
type, int64_t length,
SetData(
ArrayData::Make(std::move(type), length, std::move(buffers), null_count,
offset));
}
-
+Result<std::shared_ptr<Array>> BinaryViewArray::CompactArray(MemoryPool* pool)
const {
+ std::unique_ptr<ArrayBuilder> builder;
+ ARROW_RETURN_NOT_OK(MakeBuilder(pool, type(), &builder));
+ ArraySpan span(*data());
+ // ArraySpan handles offset
+ ARROW_RETURN_NOT_OK(builder->AppendArraySlice(span, 0, span.length));
Review Comment:
> We could certainly expose a parameter to choose the ratio of unused buffer
space above which to compact a buffer.
>
> > Additionally, there's another concern: it might be useful to expose a
public API method to indicate whether calling `CompactArray` would be
beneficial.
>
> Well, the decision is buffer-grained, not array-grained: some buffers
might be worth compacting and some others not.
>
> > What do you think? Should I integrate this heuristic into
`CompactArray`, or would it be better to split it into a separate method or
even into a different pull request?
>
> As said above, I think it would make sense to integrate the heuristic and
apply it at the buffer level.
Pardon me, just to be sure — does your overall suggestion correspond to
pseudocode like this?
```c++
Result<std::shared_ptr<Array>> BinaryViewArray::CompatArray(
MemoryPool* pool, double occupancy_threshold) const {
..
..
..
for (int32_t i = 0; i < number_of_data_buffer; ++i) {
if (BufferOccupancy(i) <= occupancy_threshold) {
// Do something
}
}
return result;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]