westonpace commented on PR #35541:
URL: https://github.com/apache/arrow/pull/35541#issuecomment-1544027272

   Structs do not have any buffers themselves beyond the validity buffer.  The 
validity buffer will have a bit width of 1.  I don't think we need to worry 
about aligning any buffers with a width of 8 or less (e.g. only worry about it 
once the width is 16)
   
   One problem with this PR is that it doesn't address offsets buffers.  For 
example, in a string / list / binary array there is a 32-bit offsets buffer 
which should be aligned to 32.  These arrays won't show up as fixed width 
arrays.
   
   I was working on this a bit yesterday as well.  I ended up with something 
like this...
   
   ```
   bool CheckMallocAlignment(const ArrayData& array) {
     auto type_id = array.type->storage_id();
     if (type_id == Type::DICTIONARY) {
       // The values array will be checked separately
       type_id = 
::arrow::internal::checked_pointer_cast<DictionaryType>(array.type)
                     ->index_type()
                     ->id();
     }
     switch (array.type->id()) {
       case Type::NA:
       case Type::FIXED_SIZE_LIST:
       case Type::FIXED_SIZE_BINARY:
       case Type::BOOL:
       case Type::INT8:
       case Type::UINT8:
       case Type::DECIMAL128:
       case Type::DECIMAL256:
       case Type::SPARSE_UNION:
       case Type::RUN_END_ENCODED:
       case Type::STRUCT:
         // These have no buffers or all buffers need only byte alignment
         return true;
       case Type::INT16:
       case Type::UINT16:
       case Type::HALF_FLOAT:
         return CheckValuesAlignment(array, 16);
       case Type::INT32:
       case Type::UINT32:
       case Type::FLOAT:
       case Type::STRING:
       case Type::BINARY:
       case Type::DATE32:
       case Type::TIME32:
       case Type::LIST:
       case Type::MAP:
       case Type::DENSE_UNION:
       case Type::INTERVAL_MONTHS:
         return CheckValuesAlignment(array, 32);
       case Type::INT64:
       case Type::UINT64:
       case Type::DOUBLE:
       case Type::LARGE_BINARY:
       case Type::LARGE_LIST:
       case Type::LARGE_STRING:
       case Type::DATE64:
       case Type::TIME64:
       case Type::TIMESTAMP:
       case Type::DURATION:
       case Type::INTERVAL_DAY_TIME:
         return CheckValuesAlignment(array, 64);
       case Type::INTERVAL_MONTH_DAY_NANO:
         return CheckValuesAlignment(array, 128);
       default:
         return true;
     }
   }
   ```
   
   It's not pretty.  I also tried using a visitor pattern but it was ending up 
to be more complicated (though arguably a bit more future proof were we to add 
more fixed-width types).  `CheckValuesAlignment` checks the buffer at index 1.  
Fortunately, it seems that in all cases, the buffer that needs alignment seems 
to be at that position.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to