westonpace commented on code in PR #35565:
URL: https://github.com/apache/arrow/pull/35565#discussion_r1199260684
##########
cpp/src/arrow/util/align_util.cc:
##########
@@ -30,12 +32,120 @@ bool CheckAlignment(const Buffer& buffer, int64_t
alignment) {
return buffer.address() % alignment == 0;
}
-bool CheckAlignment(const ArrayData& array, int64_t alignment) {
- for (const auto& buffer : array.buffers) {
- if (buffer) {
- if (!CheckAlignment(*buffer, alignment)) return false;
+namespace {
+
+// Some buffers are frequently type-punned. For example, in an int32 array the
+// values buffer is frequently cast to int32_t*
+//
+// This sort of punning is only valid if the pointer is aligned to a proper
width
+// (e.g. 4 bytes in the case of int32).
+//
+// We generally assume that all buffers are at least 8-bit aligned and so we
only
+// need to worry about buffers that are commonly cast to wider data types.
Note that
+// this alignment is something that is guaranteed by malloc (e.g. new
int32_t[] will
+// return a buffer that is 4 byte aligned) or common libraries (e.g. numpy)
but it is
+// not currently guaranteed by flight (GH-32276).
+//
+// By happy coincedence, for every data type, the only buffer that might need
wider
+// alignment is the second buffer (at index 1). This function returns the
expected
+// alignment (in bits) of the second buffer for the given array to safely
allow this cast.
+//
+// If the array's type doesn't have a second buffer or the second buffer is
not expected
+// to be type punned, then we return 8.
+int GetMallocValuesAlignment(const ArrayData& array) {
+ // Make sure to use the storage type id
+ auto type_id = array.type->storage_id();
+ if (type_id == Type::DICTIONARY) {
+ // The values buffer is in a different ArrayData and so we only check the
indices
+ // buffer here. The values array data will be checked by the calling
method.
+ type_id =
::arrow::internal::checked_pointer_cast<DictionaryType>(array.type)
+ ->index_type()
+ ->id();
+ }
+ switch (type_id) {
+ case Type::NA: // No buffers
+ case Type::FIXED_SIZE_LIST: // No second buffer (values in child array)
+ case Type::FIXED_SIZE_BINARY: // Fixed size binary could be dangerous but
the
+ // compute kernels don't type pun this.
E.g. if
+ // an extension type is storing some kind
of struct
+ // here then the user should do their own
alignment
+ // check before casting to an array of
structs
+ case Type::BOOL: // Always treated as uint8_t*
+ case Type::INT8: // Always treated as uint8_t*
+ case Type::UINT8: // Always treated as uint8_t*
+ case Type::DECIMAL128: // Always treated as uint8_t*
+ case Type::DECIMAL256: // Always treated as uint8_t*
Review Comment:
I've updated the expected alignment for decimal128/decimal256 arrays to 8.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]