hi all,

As we've been discussing [1], there is a need to introduce 4 bytes of
padding into the preamble of the "encapsulated IPC message" format to
ensure that the Flatbuffers metadata payload begins on an 8-byte
aligned memory offset. The alternative to this would be for Arrow
implementations where alignment is important (e.g. C or C++) to copy
the metadata (which is not always small) into memory when it is
unaligned.

Micah has proposed to address this by adding a
4-byte "continuation" value at the beginning of the payload
having the value 0xFFFFFFFF. The reason to do it this way is that
old clients will see an invalid length (what is currently the
first 4 bytes of the message -- a 32-bit little endian signed
integer indicating the metadata length) rather than potentially
crashing on a valid length. We also propose to expand the "end of
stream" marker used in the stream and file format from 4 to 8
bytes. This has the additional effect of aligning the file footer
defined in File.fbs.

This would be a backwards incompatible protocol change, so older Arrow
libraries would not be able to read these new messages. Maintaining
forward compatibility (reading data produced by older libraries) would
be possible as we can reason that a value other than the continuation
value was produced by an older library (and then validate the
Flatbuffer message of course). Arrow implementations could offer a
backward compatibility mode for the sake of old readers if they desire
(this may also assist with testing).

Additionally with this vote, we want to formally approve the change to
the Arrow "file" format to always write the (new 8-byte) end-of-stream
marker, which enables code that processes Arrow streams to safely read
the file's internal messages as though they were a normal stream.

The PR making these changes to the IPC documentation is here

https://github.com/apache/arrow/pull/4951

Please vote to accept these changes. This vote will be open for at
least 72 hours

[ ] +1 Adopt these Arrow protocol changes
[ ] +0
[ ] -1 I disagree because...

Here is my vote: +1

Thanks,
Wes

[1]: 
https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E

Reply via email to