I just created https://issues.apache.org/jira/browse/ARROW-1224

On Sun, Jul 16, 2017 at 7:03 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> hi Brian,
>
> In the record batch IPC formats (stream and file), the buffers are
> supposed to be padded at minimum to an 8 byte offset, so that all
> buffers start on an 8-byte aligned offset.
>
> We should revisit this aspect of the format documents -- ideally
> buffers would be 64-byte padded so that code that uses AVX512 can be
> used more frequently. I think it would be better in the specification
> to say: 64-byte padding is preferred, but 8-byte alignment (of start
> offsets) and padding in IPC is the minimum requirement. In the C++
> library for example, we are rounding up all our allocations to a
> multiple of 64 bytes.
>
> It's possible there's a missing alignment in the Java writer, so if
> you can find a reproducible case where the IPC payload has a
> misaligned buffer start offset we should definitely fix that as soon
> as possible.
>
> - Wes
>
> On Sun, Jul 16, 2017 at 9:05 AM, bhulette <bhule...@ccri.com> wrote:
>> Emilio and I ran into some byte alignment issues last week. We're generating
>> data in the streaming format with the java lib, but the javascript lib is
>> failing to read it because some of the buffers don't appear to be aligned.
>>
>> Its not clear to us which and is implemented incorrectly - the spec
>> (https://arrow.apache.org/docs/memory_layout.html) says buffers should be
>> padded to 64 byte boundaries - does that extend to record batches in the IPC
>> formats?
>>
>> The javascript implementation currently uses typed arrays to create views
>> for each buffer, which need to be aligned. We're looking into using a
>> DataView or a flatbuffers ByteBuffer to get around this issue for now, but
>> I'm wondering if this is a bug in the java implementation.
>>
>> Brian

Reply via email to