Thank you for asking this question. I have the same question. I noted a similar problem in the c++/python implementation: https://github.com/apache/arrow/issues/19157#issuecomment-1528037394
On Tue, Apr 2, 2024, 04:30 Finn Völkel <f...@juxt.pro> wrote: > Hi, > > my question primarily concerns the union layout described at > https://arrow.apache.org/docs/format/Columnar.html#union-layout > > There are two ways to use unions: > > - polymorphic vectors (world 1) > - ADT style vectors (world 2) > > In world 1 you have a vector that stores different types. In the ADT world > you could have multiple child vectors with the same type but different type > ids in the union type vector. The difference is apparent if you want to use > two BigIntVectors as children which doesn't exist in world 1. World 1 is a > subset of world 2. > > The spec (to my understanding) doesn’t explicitly forbid world 2, but the > implementation we have been using (Java) has been making the assumption of > being in world 1 (a union only having ONE child of each type). We sometimes > use union in the ADT style which has led to problems down the road. > > Could someone clarify what the specification allows and what it doesn’t > allow? Could we tighten the specification after that clarification? > > Best, Finn >