On Wed, Jun 24, 2020 at 1:07 PM Francois Saint-Jacques
<fsaintjacq...@gmail.com> wrote:
>
> OTOH,
>
> how do we handle NullType -> UnionType<T...> cast conversion? Do we
> require some convention like the first children ArrayData null bitmap
> to be set and all tags set to 0?

Sure, that sounds like a reasonable implementation should this
operation actually be required.

> François
>
> On Wed, Jun 24, 2020 at 1:09 PM Antoine Pitrou <anto...@python.org> wrote:
> >
> >
> > Le 24/06/2020 à 18:34, Wes McKinney a écrit :
> > > On Wed, Jun 24, 2020 at 11:08 AM Antoine Pitrou <anto...@python.org> 
> > > wrote:
> > >>
> > >>
> > >> Le 24/06/2020 à 16:57, Wes McKinney a écrit :
> > >>> hi folks,
> > >>>
> > >>> As discussed on the recent GitHub PR [1], as a means of reconciling
> > >>> the long-standing cross-implementation incompatibilities with Union
> > >>> types, it's been proposed to remove the top-level validity bitmap from
> > >>> the Union data layout and let validity be determined exclusively by
> > >>> the child arrays of the union. So the only additional data needed to
> > >>> form a union are the type ids (and for the dense union, the offsets).
> > >>>
> > >>> I do not think this change meaningfully alters the semantics of Union
> > >>> types and I think it also simplifies their construction, so I would be
> > >>> in favor of making it for 1.0.0.
> > >>
> > >> So it sounds like this may break compatibility with existing only uses
> > >> of Arrow C++ (and the relevant bindings: PyArrow, Arrow C/GLib, Red
> > >> Arrow); not only on the API side, but on the data side.
> > >
> > > Right. However, I don't think these changes will be very disruptive,
> > > and we always knew that this disruption was possible because of the
> > > hitherto unreconciled issues with Unions. The applications that I'm
> > > aware of that use Union serialization (e.g. Ray) use it only for
> > > ephemeral serialization.
> >
> > Ok, that's a convincing argument.
> >
> > > In general, I think that we should be bumping the metadata version [1]
> > > for 1.0.0 to create a forcing function for upgrade to the
> > > format-stable line of libraries. The C++/Python libraries could have a
> > > "compatibility mode" (like the "write_legacy_ipc_format" options) that
> > > writes MetadataVersion::V4 (v0.8.0 -> v0.17.1) with certain features
> > > (like unions -- which are not needed for Spark for example) disabled.
> >
> > Hmm, I hope we can keep the negotiation minimal.  We should take from
> > the Jon Postel principle - be liberal in what you accept, strict in what
> > you emit.
> >
> > So the IPC reader can have a simple detection that goes this way:
> >
> >   * if we receive 1 buffer for sparse union or 2 buffers for dense union
> > => it's the new-style format, there's nothing to do
> >
> >   * if we receive 2 (non-null) buffers for sparse union or 3 (non-null)
> > buffers for dense union
> > => it's the old format, we should AND the parent bitmap into each of the
> > child bitmaps
> >
> > We can also add a flag to IpcOptions to enable/disable compatibility tricks.
> >
> > Regards
> >
> > Antoine.

Reply via email to