Thanks Antoine,

Can you just clarify what you mean by 'type ids are logical'? In my mind
type ids are strongly coupled to the types and their order in Schema.fbs
[1]. Do you mean that the order there is only a convention and we can't
assume that 0 === Null?

Best,
Ryan

[1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235

On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> > Hey All,
> >
> > While working on https://issues.apache.org/jira/browse/ARROW-1692 I
> noticed
> > that there is a difference between C++ and Java on the way Sparse Unions
> > are handled. I haven't seen in the format spec which the correct is so I
> > wanted to check with the wider community.
> >
> > c++ (and the integration tests) see sparse unions as:
> > name
> > count
> > VALIDITY[]
> > TYPE_ID[]
> > children[]
> >
> > and java as:
> > name
> > count
> > TYPE[]
> > children[]
> >
> > The precise names may only be important for json reading/writing in the
> > integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
> > difference is that Java doesn't have a validity buffer and c++ does. My
> > understanding is thta technically the validity buffer is redundant (0
> type
> > == NULL) so I can see why Java would omit it. My question is then: which
> > language is 'correct'?
>
> Union type ids are logical, so 0 could very well be a valid type id.
> You can't assume that type 0 means a null entry.
>
> Regards
>
> Antoine.
>

Reply via email to