westonpace commented on PR #35541:
URL: https://github.com/apache/arrow/pull/35541#issuecomment-1544027272
Structs do not have any buffers themselves beyond the validity buffer. The
validity buffer will have a bit width of 1. I don't think we need to worry
about aligning any buffers with a width of 8 or less (e.g. only worry about it
once the width is 16)
One problem with this PR is that it doesn't address offsets buffers. For
example, in a string / list / binary array there is a 32-bit offsets buffer
which should be aligned to 32. These arrays won't show up as fixed width
arrays.
I was working on this a bit yesterday as well. I ended up with something
like this...
```
bool CheckMallocAlignment(const ArrayData& array) {
auto type_id = array.type->storage_id();
if (type_id == Type::DICTIONARY) {
// The values array will be checked separately
type_id =
::arrow::internal::checked_pointer_cast<DictionaryType>(array.type)
->index_type()
->id();
}
switch (array.type->id()) {
case Type::NA:
case Type::FIXED_SIZE_LIST:
case Type::FIXED_SIZE_BINARY:
case Type::BOOL:
case Type::INT8:
case Type::UINT8:
case Type::DECIMAL128:
case Type::DECIMAL256:
case Type::SPARSE_UNION:
case Type::RUN_END_ENCODED:
case Type::STRUCT:
// These have no buffers or all buffers need only byte alignment
return true;
case Type::INT16:
case Type::UINT16:
case Type::HALF_FLOAT:
return CheckValuesAlignment(array, 16);
case Type::INT32:
case Type::UINT32:
case Type::FLOAT:
case Type::STRING:
case Type::BINARY:
case Type::DATE32:
case Type::TIME32:
case Type::LIST:
case Type::MAP:
case Type::DENSE_UNION:
case Type::INTERVAL_MONTHS:
return CheckValuesAlignment(array, 32);
case Type::INT64:
case Type::UINT64:
case Type::DOUBLE:
case Type::LARGE_BINARY:
case Type::LARGE_LIST:
case Type::LARGE_STRING:
case Type::DATE64:
case Type::TIME64:
case Type::TIMESTAMP:
case Type::DURATION:
case Type::INTERVAL_DAY_TIME:
return CheckValuesAlignment(array, 64);
case Type::INTERVAL_MONTH_DAY_NANO:
return CheckValuesAlignment(array, 128);
default:
return true;
}
}
```
It's not pretty. I also tried using a visitor pattern but it was ending up
to be more complicated (though arguably a bit more future proof were we to add
more fixed-width types). `CheckValuesAlignment` checks the buffer at index 1.
Fortunately, it seems that in all cases, the buffer that needs alignment seems
to be at that position.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]