Hi Emilio,

> So I think the issue is that we are serializing record batches in a
distributed fashion, and then > concatenating them in the streaming format.

Can you show the code for this?

On Tue, Aug 8, 2017 at 12:35 PM, Emilio Lahr-Vivaz <elahrvi...@ccri.com>
wrote:

> So I think the issue is that we are serializing record batches in a
> distributed fashion, and then concatenating them in the streaming format.
> However, the message serialization only aligns the start of the buffers,
> which requires it to know the current absolute offset of the output stream.
> Would there be any problem with padding the end of the message, so any
> single serialized record batch would always be a multiple of 8 bytes?
>
> I've put together a branch that does this, and the existing java tests all
> pass. I'm having some trouble running the integration tests though.
>
> Thanks,
>
> Emilio
>
>
> On 08/08/2017 09:18 AM, Emilio Lahr-Vivaz wrote:
>
>> Hi Wes,
>>
>> You're right, I just realized that. I think the alignment issue might be
>> in some unrelated code, actually. From what I can tell the the arrow
>> writers are aligning buffers correctly; if not I'll open a bug.
>>
>> Thanks,
>>
>> Emilio
>>
>> On 08/08/2017 09:15 AM, Wes McKinney wrote:
>>
>>> hi Emilio,
>>>
>>>  From your description, it isn't clear why 8-byte alignment is causing
>>> a problem (as compare with 64-byte alignment). My understanding is
>>> that JavaScript's TypedArray classes range in size from 1 to 8 bytes.
>>>
>>> The starting offset for all buffers should be 8-byte aligned, if not
>>> that is a bug. Could you clarify?
>>>
>>> - Wes
>>>
>>> On Tue, Aug 8, 2017 at 8:52 AM, Emilio Lahr-Vivaz <elahrvi...@ccri.com>
>>> wrote:
>>>
>>>> After looking at it further, I think only the buffers themselves need
>>>> to be
>>>> aligned, not the metadata and/or schema. Would there be any problem with
>>>> changing the alignment to 64 bytes then?
>>>>
>>>> Thanks,
>>>>
>>>> Emilio
>>>>
>>>>
>>>> On 08/08/2017 08:08 AM, Emilio Lahr-Vivaz wrote:
>>>>
>>>>> I'm looking into buffer alignment in the java writer classes. Currently
>>>>> some files written with the java streaming writer can't be read due to
>>>>> the
>>>>> javascript TypedArray's restriction that the start offset of the array
>>>>> must
>>>>> be a multiple of the data size of the array type (i.e. Int32Vectors
>>>>> must
>>>>> start on a multiple of 4, Float64Vectors must start on a multiple of 8,
>>>>> etc). From a cursory look at the java writer, I believe that the
>>>>> schema that
>>>>> is written first is not aligned at all, and then each record batch
>>>>> pads out
>>>>> its size to a multiple of 8. So:
>>>>>
>>>>> 1. should the schema block pad itself so that the first record batch is
>>>>> aligned, and is there any problem with doing so?
>>>>> 2. is there any problem with changing the alignment to 64 bytes, as
>>>>> recommended (but not required) by the spec?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Emilio
>>>>>
>>>>
>>>>
>>
>

Reply via email to