hi Brian, In the record batch IPC formats (stream and file), the buffers are supposed to be padded at minimum to an 8 byte offset, so that all buffers start on an 8-byte aligned offset.
We should revisit this aspect of the format documents -- ideally buffers would be 64-byte padded so that code that uses AVX512 can be used more frequently. I think it would be better in the specification to say: 64-byte padding is preferred, but 8-byte alignment (of start offsets) and padding in IPC is the minimum requirement. In the C++ library for example, we are rounding up all our allocations to a multiple of 64 bytes. It's possible there's a missing alignment in the Java writer, so if you can find a reproducible case where the IPC payload has a misaligned buffer start offset we should definitely fix that as soon as possible. - Wes On Sun, Jul 16, 2017 at 9:05 AM, bhulette <bhule...@ccri.com> wrote: > Emilio and I ran into some byte alignment issues last week. We're generating > data in the streaming format with the java lib, but the javascript lib is > failing to read it because some of the buffers don't appear to be aligned. > > Its not clear to us which and is implemented incorrectly - the spec > (https://arrow.apache.org/docs/memory_layout.html) says buffers should be > padded to 64 byte boundaries - does that extend to record batches in the IPC > formats? > > The javascript implementation currently uses typed arrays to create views > for each buffer, which need to be aligned. We're looking into using a > DataView or a flatbuffers ByteBuffer to get around this issue for now, but > I'm wondering if this is a bug in the java implementation. > > Brian