Hi All,

I'd like to bump this thread to get some more feedbacks from other people.
I think what Wes says makes sense, there seems to be two requirement for
union types and it might make sense to make them different types.

I think Dremio has more use case for the first type of union. I think Ray
also has use case for union but I am not sure if it's closer to the first
or the second. How do people feel about spec out details for the first
union type?

On Thu, Jan 11, 2018 at 2:39 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> hi all,
>
> So one of the conflicts that keeps coming up re: unions is the
> following two notions:
>
> * A union as a "variant of primitives" type. Here, values are
> constrained to be one of Arrow's primitive types (integer, floating
> point, string, boolean, etc.). The value types are statically declared
> and thus the union type codes have a fixed interpretation (e.g. 0 is
> always boolean, 1 always int8, etc. and so on).
>
> * A union as a composition of any child types (including nested
> types). In this model, a union internally is like a struct plus type
> codes, which refer to a collection of any fields, which may include
> other nested types
>
> IMHO, these are two different and totally valid things to support. The
> former can be viewed as a special case of the latter, but there are
> benefits to computation engines to rely on the assumptions of the
> former (like the type codes having a static interpretation rather than
> a dynamic one).
>
> Not having the latter union type seems troublesome to me. For example,
> other data serialization systems support this
>
> * oneof in Protocol Buffers
> https://developers.google.com/protocol-buffers/docs/proto#oneof
> * union in Flatbuffers https://google.github.io/
> flatbuffers/md__schemas.html
> * union in Thrift (not documented very well unfortunately)
> * union in Avro (I think this is the same)
>
> Thanks
> Wes
>
> On Thu, Jan 11, 2018 at 11:16 AM, Li Jin <ice.xell...@gmail.com> wrote:
> > Hi All,
> >
> > Here is a summary of the state and issue of union vector (to the best of
> my
> > knowledge).
> >
> > I have summarized some possible solutions based on the discussion so far.
> > However, this is not a proposal as there are still a lot of things that
> are
> > not clear at this moment.
> >
> > I'd like to share this as a base for further discussion and move towards
> a
> > proposal. Thank you.
> >
> > https://docs.google.com/document/d/1zSwSZDVxgmoDol_
> PKfyTDHD5wbw1eALs5eTS9kyjtYU/edit?usp=sharing
> >
> > Li
>

Reply via email to