I just created https://issues.apache.org/jira/browse/ARROW-1224
On Sun, Jul 16, 2017 at 7:03 PM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Brian, > > In the record batch IPC formats (stream and file), the buffers are > supposed to be padded at minimum to an 8 byte offset, so that all > buffers start on an 8-byte aligned offset. > > We should revisit this aspect of the format documents -- ideally > buffers would be 64-byte padded so that code that uses AVX512 can be > used more frequently. I think it would be better in the specification > to say: 64-byte padding is preferred, but 8-byte alignment (of start > offsets) and padding in IPC is the minimum requirement. In the C++ > library for example, we are rounding up all our allocations to a > multiple of 64 bytes. > > It's possible there's a missing alignment in the Java writer, so if > you can find a reproducible case where the IPC payload has a > misaligned buffer start offset we should definitely fix that as soon > as possible. > > - Wes > > On Sun, Jul 16, 2017 at 9:05 AM, bhulette <bhule...@ccri.com> wrote: >> Emilio and I ran into some byte alignment issues last week. We're generating >> data in the streaming format with the java lib, but the javascript lib is >> failing to read it because some of the buffers don't appear to be aligned. >> >> Its not clear to us which and is implemented incorrectly - the spec >> (https://arrow.apache.org/docs/memory_layout.html) says buffers should be >> padded to 64 byte boundaries - does that extend to record batches in the IPC >> formats? >> >> The javascript implementation currently uses typed arrays to create views >> for each buffer, which need to be aligned. We're looking into using a >> DataView or a flatbuffers ByteBuffer to get around this issue for now, but >> I'm wondering if this is a bug in the java implementation. >> >> Brian