> > I verified with these changes [1], without backwards compatibility > support, UBSAN runs cleanly for IPC tests in C++
Actually, it looks like my Mac's version of UBSAN doesn't detect the issue at all. I will try on linux a by EOD. On Thu, Aug 15, 2019 at 12:52 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > +1 > > I verified with these changes [1], without backwards compatibility > support, UBSAN runs cleanly for IPC tests in C++ > > Just wanted to clarify: > >> Additionally with this vote, we want to formally approve the change to >> the Arrow "file" format to always write the (new 8-byte) end-of-stream >> marker, which enables code that processes Arrow streams to safely read >> the file's internal messages as though they were a normal stream. > > This only allows for reading messages safely, we still aren't guaranteeing > dictionary batches occur in the file before they are used, correct? > > Thanks, > Micah > > [1] > https://github.com/emkornfield/arrow/commit/8b8348d8bcf62b50c35ddb4926f3d501b4f7147c > > > On Wed, Aug 14, 2019 at 3:43 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> hi all, >> >> As we've been discussing [1], there is a need to introduce 4 bytes of >> padding into the preamble of the "encapsulated IPC message" format to >> ensure that the Flatbuffers metadata payload begins on an 8-byte >> aligned memory offset. The alternative to this would be for Arrow >> implementations where alignment is important (e.g. C or C++) to copy >> the metadata (which is not always small) into memory when it is >> unaligned. >> >> Micah has proposed to address this by adding a >> 4-byte "continuation" value at the beginning of the payload >> having the value 0xFFFFFFFF. The reason to do it this way is that >> old clients will see an invalid length (what is currently the >> first 4 bytes of the message -- a 32-bit little endian signed >> integer indicating the metadata length) rather than potentially >> crashing on a valid length. We also propose to expand the "end of >> stream" marker used in the stream and file format from 4 to 8 >> bytes. This has the additional effect of aligning the file footer >> defined in File.fbs. >> >> This would be a backwards incompatible protocol change, so older Arrow >> libraries would not be able to read these new messages. Maintaining >> forward compatibility (reading data produced by older libraries) would >> be possible as we can reason that a value other than the continuation >> value was produced by an older library (and then validate the >> Flatbuffer message of course). Arrow implementations could offer a >> backward compatibility mode for the sake of old readers if they desire >> (this may also assist with testing). >> >> Additionally with this vote, we want to formally approve the change to >> the Arrow "file" format to always write the (new 8-byte) end-of-stream >> marker, which enables code that processes Arrow streams to safely read >> the file's internal messages as though they were a normal stream. >> >> The PR making these changes to the IPC documentation is here >> >> https://github.com/apache/arrow/pull/4951 >> >> Please vote to accept these changes. This vote will be open for at >> least 72 hours >> >> [ ] +1 Adopt these Arrow protocol changes >> [ ] +0 >> [ ] -1 I disagree because... >> >> Here is my vote: +1 >> >> Thanks, >> Wes >> >> [1]: >> https://lists.apache.org/thread.html/8440be572c49b7b2ffb76b63e6d935ada9efd9c1c2021369b6d27786@%3Cdev.arrow.apache.org%3E >> >