On Sun, Jun 30, 2019 at 5:14 PM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Micah, > > This is definitely unfortunate, I wish we had realized the potential > implications of having the Flatbuffer message start on a 4-byte > (rather than 8-byte) boundary. The cost of making such a change now > would be pretty high since all readers and writers in all languages > would have to be changed. That being said, the 0.14.0 -> 1.0.0 version > bump is the last opportunity we have to make a change like this, so we > might as well discuss it now. Note that particular implementations > could implement compatibility functions to handle the 4 to 8 byte > change so that old clients can still be understood. We'd probably want > to do this in C++, for example, since users would pretty quickly > acquire a new pyarrow version in Spark applications while they are > stuck on an old version of the Java libraries.
NB such a backwards compatibility fix would not be forward-compatible, so the PySpark users would need to use a pinned version of pyarrow until Spark upgraded to Arrow 1.0.0. Maybe that's OK > > - Wes > > On Sun, Jun 30, 2019 at 3:01 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > > > While working on trying to fix undefined behavior for unaligned memory > > accesses [1], I ran into an issue with the IPC specification [2] which > > prevents us from ever achieving zero-copy memory mapping and having aligned > > accesses (i.e. clean UBSan runs). > > > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned accesses. > > > > In the IPC format we align each message to 8-byte boundaries. We then > > write a int32_t integer to to denote the size of flat buffer metadata, > > followed immediately by the flatbuffer metadata. This means the > > flatbuffer metadata will never be 8 byte aligned. > > > > Do people care? A simple fix would be to use int64_t instead of int32_t > > for length. However, any fix essentially breaks all previous client > > library versions or incurs a memory copy. > > > > [1] https://github.com/apache/arrow/pull/4757 > > [2] https://arrow.apache.org/docs/ipc.html