
I am working on an Arrow implementation according to the specification as of the 0.10 release, and I’m confused about what appears to be extra padding in the file format. I generated a record batch and serialized it to the file format using pyarrow (0.10) and inspected the file in a hex editor. • The first 6 bytes are the magic (ARROW1) as expected. • The next 2 bytes are padding bytes (0x00) up to offset 0x07 (the 8-byte boundary) As far as I understand, the following bytes should be the streaming format; however, there is another zero byte (padding?) at offset 0x08 (just after the boundary). This byte is followed by a valid message size, and the rest of the format is constructed as expected. Am I missing something in the serialization documentation about padding after the magic? Am I misunderstanding the concept of an 8-byte boundary? I am using this documentation as a reference: https://github.com/apache/arrow/blob/master/format/IPC.md What is that extra byte doing there? I can't seem to find the definition in the spec. [ Full content available at: https://github.com/apache/arrow/issues/2559 ] This message was relayed via gitbox.apache.org for [email protected]
