felipecrv commented on code in PR #38027:
URL: https://github.com/apache/arrow/pull/38027#discussion_r1394318942
##########
cpp/src/arrow/array/data.h:
##########
@@ -434,6 +434,21 @@ struct ARROW_EXPORT ArraySpan {
return GetValues<T>(i, this->offset);
}
+ // Access a buffer's data as a span
+ template <typename T>
+ util::span<const T> GetSpan(int i) const {
+ return util::span(buffers[i].data_as<T>(),
+ static_cast<size_t>(buffers[i].size) / sizeof(T))
+ .subspan(offset);
+ }
+
+ template <typename T>
+ util::span<T> GetSpan(int i) {
+ return util::span(buffers[i].mutable_data_as<T>(),
+ static_cast<size_t>(buffers[i].size) / sizeof(T))
+ .subspan(offset);
Review Comment:
I think the value-add comes from a reduced risk of code that doesn't account
for the fact that the returned `span` has (sometimes, very rarely) length
greater than the array length and even-worse: the fact that the data after the
end of the length could be non-sensical (e.g. zeroes at the end of an offsets
buffer).
We risk having compute kernel code that processes these spans without issue
until one that's greater than the array length comes up. Taking the length as
parameter, drastically removes this branch in the program graph: the span
always has the array length instead of possibly also having a greater length
(in rare cases!).
Regarding this:
```diff
-data.GetSpan<int32_t>(1).subspan(0, 50);
+data.GetSpan<int32_t>(1, /*length=*/50);
```
For what I'm proposing these are not equivalent. I want to assert when the
passed length is greater than the available buffer size instead of truncating
like `subspan` does. I would also consider debug-build asserts that take into
account the array type and buffer index -- something that `span::subspan()`
can't do.
```cpp
constexpr span subspan(size_t offset) const {
if (offset > size_) return {data_, data_};
return {data_ + offset, size_ - offset};
}
```
Regarding this suggestion:
```diff
-data.GetSpan<int32_t>(1).subspan(0, 50);
+data.GetSpan<int32_t>(1, /*one_extra=*/true); // offsets buffer
```
I think this would be a very ergonomic API that could in turn be defined in
terms of the `GetSpan` implementation that takes a `length`.
> it'll never be correct to use these on a null bitmap
Definitely. We should assert (or at least document) that `i > 0`.
A final observation: we already have functions that return the base pointer
of the buffer (shifted by offset), if we are going to provide span-returning
functions, we better make the length of the span maximally useful since that's
exactly what spans add compared to raw pointers -- a length.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]