On Sun, Jun 30, 2019 at 5:14 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi Micah,
>
> This is definitely unfortunate, I wish we had realized the potential
> implications of having the Flatbuffer message start on a 4-byte
> (rather than 8-byte) boundary. The cost of making such a change now
> would be pretty high since all readers and writers in all languages
> would have to be changed. That being said, the 0.14.0 -> 1.0.0 version
> bump is the last opportunity we have to make a change like this, so we
> might as well discuss it now. Note that particular implementations
> could implement compatibility functions to handle the 4 to 8 byte
> change so that old clients can still be understood. We'd probably want
> to do this in C++, for example, since users would pretty quickly
> acquire a new pyarrow version in Spark applications while they are
> stuck on an old version of the Java libraries.

NB such a backwards compatibility fix would not be forward-compatible,
so the PySpark users would need to use a pinned version of pyarrow
until Spark upgraded to Arrow 1.0.0. Maybe that's OK

>
> - Wes
>
> On Sun, Jun 30, 2019 at 3:01 AM Micah Kornfield <emkornfi...@gmail.com> wrote:
> >
> > While working on trying to fix undefined behavior for unaligned memory
> > accesses [1], I ran into an issue with the IPC specification [2] which
> > prevents us from ever achieving zero-copy memory mapping and having aligned
> > accesses (i.e. clean UBSan runs).
> >
> > Flatbuffer metadata needs 8-byte alignment to guarantee aligned accesses.
> >
> > In the IPC format we align each message to 8-byte boundaries.  We then
> > write a int32_t integer to to denote the size of flat buffer metadata,
> > followed immediately  by the flatbuffer metadata.  This means the
> > flatbuffer metadata will never be 8 byte aligned.
> >
> > Do people care?  A simple fix  would be to use int64_t instead of int32_t
> > for length.  However, any fix essentially breaks all previous client
> > library versions or incurs a memory copy.
> >
> > [1] https://github.com/apache/arrow/pull/4757
> > [2] https://arrow.apache.org/docs/ipc.html

Reply via email to