mapleFU commented on code in PR #46271: URL: https://github.com/apache/arrow/pull/46271#discussion_r2068993074
########## cpp/src/arrow/array/data.h: ########## @@ -64,32 +64,24 @@ constexpr int64_t kUnknownNullCount = -1; /// /// This data structure is a self-contained representation of the memory and /// metadata inside an Arrow array data structure (called vectors in Java). The -/// classes arrow::Array and its subclasses provide strongly-typed accessors +/// Array class and its concrete subclasses provide strongly-typed accessors /// with support for the visitor pattern and other affordances. /// /// This class is designed for easy internal data manipulation, analytical data -/// processing, and data transport to and from IPC messages. For example, we -/// could cast from int64 to float64 like so: +/// processing, and data transport to and from IPC messages. /// -/// Int64Array arr = GetMyData(); -/// auto new_data = arr.data()->Copy(); -/// new_data->type = arrow::float64(); -/// DoubleArray double_arr(new_data); +/// This class is also useful in an analytics setting where memory may be +/// efficiently reused. For example, computing the Abs of a numeric array +/// should return null iff the input is null: therefore, an Abs function can +/// reuse the validity bitmap (a Buffer) of its input as the validity bitmap +/// of its output. /// -/// This object is also useful in an analytics setting where memory may be -/// reused. For example, if we had a group of operations all returning doubles, -/// say: -/// -/// Log(Sqrt(Expr(arr))) -/// -/// Then the low-level implementations of each of these functions could have -/// the signatures -/// -/// void Log(const ArrayData& values, ArrayData* out); -/// -/// As another example a function may consume one or more memory buffers in an -/// input array and replace them with newly-allocated data, changing the output -/// data type as well. +/// This class is meant mostly for immutable data access. Any mutable access +/// (either to ArrayData members or to the contents of its Buffers) should take +/// into account the fact that ArrayData instances are typically wrapped in a Review Comment: Is "instances are ... in a" a good way? ########## cpp/src/arrow/array/data.h: ########## @@ -194,25 +186,37 @@ struct ARROW_EXPORT ArrayData { return *this; } + /// \brief Return a shallow copy of this ArrayData std::shared_ptr<ArrayData> Copy() const { return std::make_shared<ArrayData>(*this); } - /// \brief Copy all buffers and children recursively to destination MemoryManager + /// \brief Deep copy this ArrayData to destination memory manager /// - /// This utilizes MemoryManager::CopyBuffer to create a new ArrayData object - /// recursively copying the buffers and all child buffers to the destination - /// memory manager. This includes dictionaries if applicable. + /// Returns a new ArrayData object with buffers and all child buffers + /// copied to the destination memory manager. This includes dictionaries + /// if applicable. Result<std::shared_ptr<ArrayData>> CopyTo( const std::shared_ptr<MemoryManager>& to) const; - /// \brief View or Copy this ArrayData to destination memory manager. + + /// \brief View or copy this ArrayData to destination memory manager /// /// Tries to view the buffer contents on the given memory manager's device /// if possible (to avoid a copy) but falls back to copying if a no-copy view /// isn't supported. Result<std::shared_ptr<ArrayData>> ViewOrCopyTo( const std::shared_ptr<MemoryManager>& to) const; + /// \brief Return the null-ness of a given array element + /// + /// Calling `IsNull(i)` is the same as `!IsValid(i)`. bool IsNull(int64_t i) const { return !IsValid(i); } + /// \brief Return the validity of a given array element + /// + /// For most data types, this will simply query the validity bitmap. + /// For union and run-end-encoded arrays, the underlying child data is + /// queried instead. + /// For dictionary arrays, this reflects the validity of the dictionary + /// index, but the corresponding dictionary value might still be null. Review Comment: Should we says that for null array it just checks `null_count`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org