westonpace commented on PR #35541: URL: https://github.com/apache/arrow/pull/35541#issuecomment-1544027272
Structs do not have any buffers themselves beyond the validity buffer. The validity buffer will have a bit width of 1. I don't think we need to worry about aligning any buffers with a width of 8 or less (e.g. only worry about it once the width is 16) One problem with this PR is that it doesn't address offsets buffers. For example, in a string / list / binary array there is a 32-bit offsets buffer which should be aligned to 32. These arrays won't show up as fixed width arrays. I was working on this a bit yesterday as well. I ended up with something like this... ``` bool CheckMallocAlignment(const ArrayData& array) { auto type_id = array.type->storage_id(); if (type_id == Type::DICTIONARY) { // The values array will be checked separately type_id = ::arrow::internal::checked_pointer_cast<DictionaryType>(array.type) ->index_type() ->id(); } switch (array.type->id()) { case Type::NA: case Type::FIXED_SIZE_LIST: case Type::FIXED_SIZE_BINARY: case Type::BOOL: case Type::INT8: case Type::UINT8: case Type::DECIMAL128: case Type::DECIMAL256: case Type::SPARSE_UNION: case Type::RUN_END_ENCODED: case Type::STRUCT: // These have no buffers or all buffers need only byte alignment return true; case Type::INT16: case Type::UINT16: case Type::HALF_FLOAT: return CheckValuesAlignment(array, 16); case Type::INT32: case Type::UINT32: case Type::FLOAT: case Type::STRING: case Type::BINARY: case Type::DATE32: case Type::TIME32: case Type::LIST: case Type::MAP: case Type::DENSE_UNION: case Type::INTERVAL_MONTHS: return CheckValuesAlignment(array, 32); case Type::INT64: case Type::UINT64: case Type::DOUBLE: case Type::LARGE_BINARY: case Type::LARGE_LIST: case Type::LARGE_STRING: case Type::DATE64: case Type::TIME64: case Type::TIMESTAMP: case Type::DURATION: case Type::INTERVAL_DAY_TIME: return CheckValuesAlignment(array, 64); case Type::INTERVAL_MONTH_DAY_NANO: return CheckValuesAlignment(array, 128); default: return true; } } ``` It's not pretty. I also tried using a visitor pattern but it was ending up to be more complicated (though arguably a bit more future proof were we to add more fixed-width types). `CheckValuesAlignment` checks the buffer at index 1. Fortunately, it seems that in all cases, the buffer that needs alignment seems to be at that position. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org