Hi All, I'd like to bump this thread to get some more feedbacks from other people. I think what Wes says makes sense, there seems to be two requirement for union types and it might make sense to make them different types.
I think Dremio has more use case for the first type of union. I think Ray also has use case for union but I am not sure if it's closer to the first or the second. How do people feel about spec out details for the first union type? On Thu, Jan 11, 2018 at 2:39 PM, Wes McKinney <wesmck...@gmail.com> wrote: > hi all, > > So one of the conflicts that keeps coming up re: unions is the > following two notions: > > * A union as a "variant of primitives" type. Here, values are > constrained to be one of Arrow's primitive types (integer, floating > point, string, boolean, etc.). The value types are statically declared > and thus the union type codes have a fixed interpretation (e.g. 0 is > always boolean, 1 always int8, etc. and so on). > > * A union as a composition of any child types (including nested > types). In this model, a union internally is like a struct plus type > codes, which refer to a collection of any fields, which may include > other nested types > > IMHO, these are two different and totally valid things to support. The > former can be viewed as a special case of the latter, but there are > benefits to computation engines to rely on the assumptions of the > former (like the type codes having a static interpretation rather than > a dynamic one). > > Not having the latter union type seems troublesome to me. For example, > other data serialization systems support this > > * oneof in Protocol Buffers > https://developers.google.com/protocol-buffers/docs/proto#oneof > * union in Flatbuffers https://google.github.io/ > flatbuffers/md__schemas.html > * union in Thrift (not documented very well unfortunately) > * union in Avro (I think this is the same) > > Thanks > Wes > > On Thu, Jan 11, 2018 at 11:16 AM, Li Jin <ice.xell...@gmail.com> wrote: > > Hi All, > > > > Here is a summary of the state and issue of union vector (to the best of > my > > knowledge). > > > > I have summarized some possible solutions based on the discussion so far. > > However, this is not a proposal as there are still a lot of things that > are > > not clear at this moment. > > > > I'd like to share this as a base for further discussion and move towards > a > > proposal. Thank you. > > > > https://docs.google.com/document/d/1zSwSZDVxgmoDol_ > PKfyTDHD5wbw1eALs5eTS9kyjtYU/edit?usp=sharing > > > > Li >