pitrou commented on code in PR #44279:
URL: https://github.com/apache/arrow/pull/44279#discussion_r2072962303


##########
python/pyarrow/ipc.pxi:
##########
@@ -48,6 +48,26 @@ cdef CMetadataVersion _unwrap_metadata_version(
     raise ValueError("Not a metadata version: " + repr(version))
 
 
+cpdef enum Alignment:
+    Any = <int8_t> CAlignment_Any

Review Comment:
   Why are we using `int8_t` here?



##########
cpp/src/arrow/util/align_util.cc:
##########
@@ -44,9 +47,13 @@ namespace {
 Type::type GetTypeForBuffers(const ArrayData& array) {
   Type::type type_id = array.type->storage_id();
   if (type_id == Type::DICTIONARY) {
-    return ::arrow::internal::checked_pointer_cast<DictionaryType>(array.type)
-        ->index_type()
-        ->id();
+    // return index type id, provided by the DictionaryType array.type or
+    // array.type->storage_type() if array.type is an ExtensionType
+    DataType* dict_type = array.type.get();
+    if (array.type->id() == Type::EXTENSION) {
+      dict_type = 
dynamic_cast<ExtensionType*>(dict_type)->storage_type().get();
+    }
+    return dynamic_cast<DictionaryType*>(dict_type)->index_type()->id();

Review Comment:
   Use `checked_cast` instead?



##########
cpp/src/arrow/ipc/options.h:
##########
@@ -128,6 +128,18 @@ struct ARROW_EXPORT IpcWriteOptions {
   static IpcWriteOptions Defaults();
 };
 
+/// \brief Alignment of data in memory
+/// Alignment values larger than 0 are taken directly as byte alignment value
+/// See util::EnsureAlignment(..., int64_t alignment, ...)
+enum class Alignment {

Review Comment:
   This will be coerced to a size, let's make sure the type corresponds?
   ```suggestion
   enum class Alignment : int64_t {
   ```



##########
docs/source/cpp/flight.rst:
##########
@@ -239,7 +239,19 @@ Memory management
 -----------------
 
 Flight tries to reuse allocations made by gRPC to avoid redundant
-data copies. However, this means that those allocations may not
+data copies. However, experience shows that such data is frequently
+misaligned. Some use cases might require data to have data type-specific
+alignment (for example, for the data buffer of an Int32 array to be aligned
+on a 4-byte boundary), which can be enforced
+by setting :member:`arrow::ipc::IpcReadOptions::ensure_alignment`
+to :member:`arrow::ipc::Alignment::kDataTypeSpecificAlignment`.
+This uses the :member:`arrow::ipc::IpcReadOptions::memory_pool`
+to a allocate memory with aligned addresses, but only for mis-alligned data.
+However, this creates data copies of your data recieved via Flight.
+
+.. note:: Ensuring memory alignment is not supported for non-CPU Flight data.

Review Comment:
   Do we need this note? Does Flight even support non-CPU data? That 
information isn't preserved accross IPC.



##########
python/pyarrow/ipc.pxi:
##########
@@ -48,6 +48,26 @@ cdef CMetadataVersion _unwrap_metadata_version(
     raise ValueError("Not a metadata version: " + repr(version))
 
 
+cpdef enum Alignment:
+    Any = <int8_t> CAlignment_Any
+    DataTypeSpecific = <int8_t> CAlignment_DataTypeSpecific
+    At64Byte = <int8_t> CAlignment_64Byte
+
+
+cdef object _wrap_alignment(CAlignment alignment):
+    return Alignment(<char> alignment)

Review Comment:
   Why `char`? I would expect `int64_t` probably.



##########
cpp/src/arrow/ipc/options.h:
##########
@@ -161,6 +173,16 @@ struct ARROW_EXPORT IpcReadOptions {
   /// RecordBatchStreamReader and StreamDecoder classes.
   bool ensure_native_endian = true;
 
+  /// \brief How to align data if mis-aligned
+  ///
+  /// Data is copied to aligned memory locations allocated via the
+  /// MemoryPool configured as \ref arrow::ipc::IpcReadOptions::memory_pool.
+  /// Some use cases might require data to have a specific alignment, for 
example,
+  /// for the data buffer of an Int32 array to be aligned on a 4-byte boundary.
+  ///
+  /// Default (kAnyAlignment) keeps the alignment as is, so no copy of data 
occurs.
+  Alignment ensure_alignment = Alignment::kAnyAlignment;

Review Comment:
   So we only allow to pass one of the enum values above, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to