Thanks for the clarification! Next time I will read the whole document ;-) On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou <anto...@python.org> wrote:
> > As explained in the comment below: > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 > > Regards > > Antoine. > > > Le 19/05/2020 à 14:14, Ryan Murray a écrit : > > Thanks Antoine, > > > > Can you just clarify what you mean by 'type ids are logical'? In my mind > > type ids are strongly coupled to the types and their order in Schema.fbs > > [1]. Do you mean that the order there is only a convention and we can't > > assume that 0 === Null? > > > > Best, > > Ryan > > > > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 > > > > On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org> > wrote: > > > >> > >> Le 19/05/2020 à 13:43, Ryan Murray a écrit : > >>> Hey All, > >>> > >>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I > >> noticed > >>> that there is a difference between C++ and Java on the way Sparse > Unions > >>> are handled. I haven't seen in the format spec which the correct is so > I > >>> wanted to check with the wider community. > >>> > >>> c++ (and the integration tests) see sparse unions as: > >>> name > >>> count > >>> VALIDITY[] > >>> TYPE_ID[] > >>> children[] > >>> > >>> and java as: > >>> name > >>> count > >>> TYPE[] > >>> children[] > >>> > >>> The precise names may only be important for json reading/writing in the > >>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the > big > >>> difference is that Java doesn't have a validity buffer and c++ does. My > >>> understanding is thta technically the validity buffer is redundant (0 > >> type > >>> == NULL) so I can see why Java would omit it. My question is then: > which > >>> language is 'correct'? > >> > >> Union type ids are logical, so 0 could very well be a valid type id. > >> You can't assume that type 0 means a null entry. > >> > >> Regards > >> > >> Antoine. > >> > > >