alamb commented on code in PR #4681:
URL: https://github.com/apache/arrow-rs/pull/4681#discussion_r1294477192
##########
arrow-ipc/src/reader.rs:
##########
@@ -1704,4 +1648,40 @@ mod tests {
let output_batch = roundtrip_ipc_stream(&input_batch);
assert_eq!(input_batch, output_batch);
}
+
+ #[test]
+ fn test_unaligned() {
Review Comment:
I ran this test without the changes in the PR and it fails like this:
```
---- reader::tests::test_unaligned stdout ----
thread 'reader::tests::test_unaligned' panicked at 'Memory pointer is not
aligned with the specified scalar type',
/Users/alamb/Software/arrow-rs2/arrow-buffer/src/buffer/scalar.rs:125:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
##########
arrow-data/src/data.rs:
##########
@@ -1589,10 +1611,13 @@ pub struct DataTypeLayout {
}
impl DataTypeLayout {
- /// Describes a basic numeric array where each element has a fixed width
- pub fn new_fixed_width(byte_width: usize) -> Self {
+ /// Describes a basic numeric array where each element has type `T`
+ pub fn new_fixed_width<T>() -> Self {
Self {
- buffers: vec![BufferSpec::FixedWidth { byte_width }],
+ buffers: vec![BufferSpec::FixedWidth {
+ byte_width: mem::size_of::<T>(),
+ alignment: mem::align_of::<T>(),
Review Comment:
I don't understand why we are aligning all buffers to native sizes -- I
thought the point was that the arrow spec said we should align buffers to 64
bits -- if that is the case, shouldn't this code reflect the spec, not the
native size?
##########
arrow-data/src/data.rs:
##########
@@ -1629,7 +1656,7 @@ impl DataTypeLayout {
#[derive(Debug, PartialEq, Eq)]
pub enum BufferSpec {
/// each element has a fixed width
- FixedWidth { byte_width: usize },
+ FixedWidth { byte_width: usize, alignment: usize },
Review Comment:
I think we should document here what `alignment ` represents (e.g. required
pointer alignment for buffers of this type)
##########
arrow-data/src/data.rs:
##########
@@ -699,6 +699,23 @@ impl ArrayData {
Self::new_null(data_type, 0)
}
+ /// Verifies that the buffers meet the minimum alignment requirements for
the data type
+ ///
+ /// Buffers that are not adequately aligned will be copied to a new
aligned allocation
+ ///
+ /// This can be useful for when interacting with data sent over IPC or
FFI, that may
+ /// not meet the minimum alignment requirements
+ pub fn align_buffers(&mut self) {
Review Comment:
Could we also please document clearly
1. What minimum alignment requirements are meant (is it Rust? Is it Arrow?)
2. What happens if we have "unaligned buffers"? (Rust errors / panics? Less
performance?)
##########
arrow-data/src/data.rs:
##########
@@ -699,6 +699,23 @@ impl ArrayData {
Self::new_null(data_type, 0)
}
+ /// Verifies that the buffers meet the minimum alignment requirements for
the data type
+ ///
+ /// Buffers that are not adequately aligned will be copied to a new
aligned allocation
+ ///
+ /// This can be useful for when interacting with data sent over IPC or
FFI, that may
+ /// not meet the minimum alignment requirements
+ pub fn align_buffers(&mut self) {
+ let layout = layout(&self.data_type);
+ for (buffer, spec) in self.buffers.iter_mut().zip(&layout.buffers) {
+ if let BufferSpec::FixedWidth { alignment, .. } = spec {
+ if buffer.as_ptr().align_offset(*alignment) != 0 {
+ *buffer = Buffer::from_slice_ref(buffer.as_ref())
Review Comment:
The way I read this PR, if a user is creating `ArrayData` directly, they
have to explicitly call `align_buffer` in order to ensure the data is aligned
and the documentation says it may copy the data.
If they are using the IPC reader then the reader may automatically align (by
copying) the data, which seems better than erroring.
As a user, if I didn't want buffers realigned automatically, it would make
probably want a setting on the IPC reader that said "error rather than silently
fix" on unaligned data. That seems like something we could add as a follow on
PR if someone wants that behavior
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]