Re: buffer alignment (format/java/js)

Emilio Lahr-Vivaz Tue, 08 Aug 2017 09:35:38 -0700

So I think the issue is that we are serializing record batches in adistributed fashion, and then concatenating them in the streamingformat. However, the message serialization only aligns the start of thebuffers, which requires it to know the current absolute offset of theoutput stream. Would there be any problem with padding the end of themessage, so any single serialized record batch would always be amultiple of 8 bytes?

I've put together a branch that does this, and the existing java testsall pass. I'm having some trouble running the integration tests though.


Thanks,

Emilio

On 08/08/2017 09:18 AM, Emilio Lahr-Vivaz wrote:

Hi Wes,
You're right, I just realized that. I think the alignment issue mightbe in some unrelated code, actually. From what I can tell the thearrow writers are aligning buffers correctly; if not I'll open a bug.
Thanks,

Emilio

On 08/08/2017 09:15 AM, Wes McKinney wrote:
hi Emilio,

 From your description, it isn't clear why 8-byte alignment is causing
a problem (as compare with 64-byte alignment). My understanding is
that JavaScript's TypedArray classes range in size from 1 to 8 bytes.

The starting offset for all buffers should be 8-byte aligned, if not
that is a bug. Could you clarify?

- Wes
On Tue, Aug 8, 2017 at 8:52 AM, Emilio Lahr-Vivaz<[email protected]> wrote:
After looking at it further, I think only the buffers themselvesneed to bealigned, not the metadata and/or schema. Would there be any problemwith
changing the alignment to 64 bytes then?

Thanks,

Emilio


On 08/08/2017 08:08 AM, Emilio Lahr-Vivaz wrote:
I'm looking into buffer alignment in the java writer classes.Currentlysome files written with the java streaming writer can't be read dueto thejavascript TypedArray's restriction that the start offset of thearray mustbe a multiple of the data size of the array type (i.e. Int32Vectorsmuststart on a multiple of 4, Float64Vectors must start on a multipleof 8,etc). From a cursory look at the java writer, I believe that theschema thatis written first is not aligned at all, and then each record batchpads out
its size to a multiple of 8. So:
1. should the schema block pad itself so that the first recordbatch is
aligned, and is there any problem with doing so?
2. is there any problem with changing the alignment to 64 bytes, as
recommended (but not required) by the spec?

Thanks,

Emilio

Re: buffer alignment (format/java/js)

Reply via email to