Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

David Li Fri, 19 Jul 2019 06:53:40 -0700

I agree with Bryan and Micah - a gradual transition as part of 1.0 (or
0.15.0) would be much less painful for us than staying on pre-1.0
until we can upgrade everything using Arrow at once. It is kind of a
'have your cake and eat it too' situation, and it would be a
maintenance burden, but something like what Micah proposes would be
ideal.


Thanks,
David

On 7/19/19, Micah Kornfield <[email protected]> wrote:
> I'm trying to work out the exact steps in my mind for a migration. It seems
> like one approach is:
>
> 1.  Add a code change which throws a clear exception it encounters -1 for
> size.  In java the reasonable place seems to be at [1] (there might be
> more?).   The exception should state that the current stream reader isn't
> compatible with version 1.0.0 streams (we should have similar ones in each
> language).  We can add a note about the environment variable in 2 if we
> decide to do it.  Release this change as 0.15.0 or 0.14.2 and ensure at
> least Spark upgrades to this version.
>
> 2.  Change the reader implementation to support reading both 1.0.0 streams
> and be backwards compatible with pre-1.0.0 streams.  Change the writer
> implementation to default to writing 1.0.0 streams but have an environment
> variable that make it write backwards compatible streams (writer
> compatibility seems like it should be optional).  Release this as 1.0.0
>
> 3. If provided, remove the environment variable switch in a later release.
>
> Thanks,
> Micah
>
> [1]
> https://github.com/apache/arrow/blob/9fe728c86caaf9ceb1827159eb172ff81fb98550/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java#L67
>
> On Thu, Jul 18, 2019 at 8:58 PM Wes McKinney <[email protected]> wrote:
>
>> To be clear, we could make a patch 0.14.x release that includes the
>> necessary compatibility changes. I presume Spark will be able to upgrade
>> to
>> a new patch release (I'd be surprised if not, otherwise how can you get
>> security fixes)?
>>
>> On Thu, Jul 18, 2019, 10:52 PM Bryan Cutler <[email protected]> wrote:
>>
>> > Hey Wes,
>> > I understand we don't want to burden 1.0 by maintaining compatibility
>> > and
>> > that is fine with me. I'm just try to figure out how to best handle
>> > this
>> > situation so Spark users won't get a cryptic error message. It sounds
>> like
>> > it will need to be handled on the Spark side to not allow mixing 1.0
>> > and
>> > pre-1.0 versions. I'm not too sure how much a 0.15.0 release with
>> > compatibility would help, it might depend on when things get released
>> > but
>> > we can discuss that in another thread.
>> >
>> > On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney <[email protected]>
>> wrote:
>> >
>> > > hi Bryan -- well, the reason for the current 0.x version is precisely
>> > > to avoid a situation where we are making decisions on the basis of
>> > > maintaining forward / backward compatibility.
>> > >
>> > > One possible way forward on this is to make a 0.15.0 (0.14.2, so
>> > > there
>> > > is less trouble for Spark to upgrade) release that supports reading
>> > > _both_ old and new variants of the protocol.
>> > >
>> > > On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <[email protected]>
>> wrote:
>> > > >
>> > > > Are we going to say that Arrow 1.0 is not compatible with any
>> > > > version
>> > > > before?  My concern is that Spark 2.4.x might get stuck on Arrow
>> > > > Java
>> > > > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will
>> > > > not
>> > > work.
>> > > > In Spark 3.0.0, though it will be no problem to update both Java
>> > > > and
>> > > Python
>> > > > to 1.0. Having a compatibility mode so that new readers/writers can
>> > work
>> > > > with old readers using a 4-byte prefix would solve the problem, but
>> if
>> > we
>> > > > don't want to do this will pyarrow be able to raise an error that
>> > clearly
>> > > > the new version does not support the old protocol?  For example,
>> would
>> > a
>> > > > pyarrow reader see the 0xFFFFFFFF and raise something like "PyArrow
>> > > > detected an old protocol and cannot continue, please use a version
>> > > > <
>> > > 1.0.0"?
>> > > >
>> > > > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Hi Francois -- copying the metadata into memory isn't the end of
>> the
>> > > world
>> > > > > but it's a pretty ugly wart. This affects every IPC protocol
>> message
>> > > > > everywhere.
>> > > > >
>> > > > > We have an opportunity to address the wart now but such a fix
>> > > post-1.0.0
>> > > > > will be much more difficult.
>> > > > >
>> > > > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
>> > > > > [email protected]> wrote:
>> > > > >
>> > > > > > If the data buffers are still aligned, then I don't think we
>> should
>> > > > > > add a breaking change just for avoiding the copy on the
>> > > > > > metadata?
>> > I'd
>> > > > > > expect said metadata to be small enough that zero-copy doesn't
>> > really
>> > > > > > affect performance.
>> > > > > >
>> > > > > > François
>> > > > > >
>> > > > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield <
>> > > [email protected]>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > While working on trying to fix undefined behavior for
>> > > > > > > unaligned
>> > > memory
>> > > > > > > accesses [1], I ran into an issue with the IPC specification
>> [2]
>> > > which
>> > > > > > > prevents us from ever achieving zero-copy memory mapping and
>> > having
>> > > > > > aligned
>> > > > > > > accesses (i.e. clean UBSan runs).
>> > > > > > >
>> > > > > > > Flatbuffer metadata needs 8-byte alignment to guarantee
>> > > > > > > aligned
>> > > > > accesses.
>> > > > > > >
>> > > > > > > In the IPC format we align each message to 8-byte boundaries.
>> We
>> > > then
>> > > > > > > write a int32_t integer to to denote the size of flat buffer
>> > > metadata,
>> > > > > > > followed immediately  by the flatbuffer metadata.  This means
>> the
>> > > > > > > flatbuffer metadata will never be 8 byte aligned.
>> > > > > > >
>> > > > > > > Do people care?  A simple fix  would be to use int64_t
>> > > > > > > instead
>> of
>> > > > > int32_t
>> > > > > > > for length.  However, any fix essentially breaks all previous
>> > > client
>> > > > > > > library versions or incurs a memory copy.
>> > > > > > >
>> > > > > > > [1] https://github.com/apache/arrow/pull/4757
>> > > > > > > [2] https://arrow.apache.org/docs/ipc.html
>> > > > > >
>> > > > >
>> > >
>> >
>>
>

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Reply via email to