lidavidm commented on code in PR #13364: URL: https://github.com/apache/arrow/pull/13364#discussion_r894754216
########## cpp/src/arrow/array/data.h: ########## @@ -242,6 +245,127 @@ struct ARROW_EXPORT ArrayData { std::shared_ptr<ArrayData> dictionary; }; +/// \brief A non-owning Buffer reference +struct ARROW_EXPORT BufferRef { + // It is the user of this class's responsibility to ensure that + // buffers that were const originally are not written to + // accidentally. + uint8_t* data = NULLPTR; + int64_t size = 0; + // Pointer back to buffer that owns this memory + const std::shared_ptr<Buffer>* owner = NULLPTR; +}; + +/// \brief EXPERIMENTAL: A non-owning ArrayData reference that is cheaply +/// copyable and does not contain any shared_ptr objects. Do not use in public +/// APIs aside from compute kernels for now +struct ARROW_EXPORT ArraySpan { + const DataType* type; + int64_t length = 0; + mutable int64_t null_count = kUnknownNullCount; + int64_t offset = 0; + BufferRef buffers[3]; + + ArraySpan() = default; + + explicit ArraySpan(const DataType* type, int64_t length) : type(type), length(length) {} + explicit ArraySpan(const ArrayData& data) { SetMembers(data); } + explicit ArraySpan(const Scalar& data) { FillFromScalar(data); } + + /// If dictionary-encoded, put dictionary in the first entry + // TODO(wesm): would a std::unique_ptr<vector<...>> be better? Review Comment: the motivation being to avoid copying the vector? also, is the SmallVector we have potentially useful here? ########## cpp/src/arrow/scalar.h: ########## @@ -129,6 +135,9 @@ namespace internal { struct ARROW_EXPORT PrimitiveScalarBase : public Scalar { using Scalar::Scalar; + /// \brief Get an immutable pointer to the value of this scalar. May be null. + virtual const void* data() const = 0; Review Comment: This seems redundant with `view()` below or could be defined in terms of `view()` ########## cpp/src/arrow/util/bitmap.h: ########## @@ -53,27 +53,36 @@ class ARROW_EXPORT Bitmap : public util::ToStringOstreamable<Bitmap>, Bitmap() = default; - Bitmap(std::shared_ptr<Buffer> buffer, int64_t offset, int64_t length) - : buffer_(std::move(buffer)), offset_(offset), length_(length) {} + Bitmap(const std::shared_ptr<Buffer>& buffer, int64_t offset, int64_t length) Review Comment: another refactor to consider: since this is now non-owning maybe this should be `BitmapView` or `BitmapSpan` and/or maybe should be built on top of `BufferRef` also, do we want to perhaps adopt a consistent naming convention for view/span/reference types? ########## cpp/src/arrow/array/data.cc: ########## @@ -128,6 +131,143 @@ int64_t ArrayData::GetNullCount() const { return precomputed; } +// ---------------------------------------------------------------------- +// Methods for ArraySpan + +void ArraySpan::SetMembers(const ArrayData& data) { + this->type = data.type.get(); + this->length = data.length; + this->null_count = data.null_count.load(); + this->offset = data.offset; + + for (size_t i = 0; i < data.buffers.size(); ++i) { + const std::shared_ptr<Buffer>& buffer = data.buffers[i]; + // It is the invoker-of-kernels's responsibility to ensure that + // const buffers are not written to accidentally. + if (buffer) { + SetBuffer(i, buffer); + } else { + ClearBuffer(i); + } + } + + // Makes sure any other buffers are seen as null / non-existent + for (size_t i = data.buffers.size(); i < 3; ++i) { + ClearBuffer(i); + } + + // TODO(wesm): what about extension arrays? Review Comment: Seems having methods that always delegated to the storage type would let these methods be generic over extension types ########## cpp/src/arrow/array/data.h: ########## @@ -242,6 +245,127 @@ struct ARROW_EXPORT ArrayData { std::shared_ptr<ArrayData> dictionary; }; +/// \brief A non-owning Buffer reference +struct ARROW_EXPORT BufferRef { + // It is the user of this class's responsibility to ensure that + // buffers that were const originally are not written to + // accidentally. + uint8_t* data = NULLPTR; + int64_t size = 0; + // Pointer back to buffer that owns this memory + const std::shared_ptr<Buffer>* owner = NULLPTR; +}; + +/// \brief EXPERIMENTAL: A non-owning ArrayData reference that is cheaply +/// copyable and does not contain any shared_ptr objects. Do not use in public +/// APIs aside from compute kernels for now +struct ARROW_EXPORT ArraySpan { + const DataType* type; + int64_t length = 0; + mutable int64_t null_count = kUnknownNullCount; + int64_t offset = 0; + BufferRef buffers[3]; + + ArraySpan() = default; + + explicit ArraySpan(const DataType* type, int64_t length) : type(type), length(length) {} + explicit ArraySpan(const ArrayData& data) { SetMembers(data); } + explicit ArraySpan(const Scalar& data) { FillFromScalar(data); } Review Comment: Given the number of small changes to call this constructor all over, I wonder if it's worth letting these be implicit constructors like we do for a few other 'core' types. Then again hiding the conversion might be too error-prone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org