Thank you for asking this question. I have the same question.

I noted a similar problem in the c++/python implementation:
https://github.com/apache/arrow/issues/19157#issuecomment-1528037394

On Tue, Apr 2, 2024, 04:30 Finn Völkel <f...@juxt.pro> wrote:

> Hi,
>
> my question primarily concerns the union layout described at
> https://arrow.apache.org/docs/format/Columnar.html#union-layout
>
> There are two ways to use unions:
>
>    - polymorphic vectors (world 1)
>    - ADT style vectors (world 2)
>
> In world 1 you have a vector that stores different types. In the ADT world
> you could have multiple child vectors with the same type but different type
> ids in the union type vector. The difference is apparent if you want to use
> two BigIntVectors as children which doesn't exist in world 1. World 1 is a
> subset of world 2.
>
> The spec (to my understanding) doesn’t explicitly forbid world 2, but the
> implementation we have been using (Java) has been making the assumption of
> being in world 1 (a union only having ONE child of each type). We sometimes
> use union in the ADT style which has led to problems down the road.
>
> Could someone clarify what the specification allows and what it doesn’t
> allow? Could we tighten the specification after that clarification?
>
> Best, Finn
>

Reply via email to