Also, you may want to run the integration tests and inspect the generated JSON file for union data, it will probably be informative (look for type ids).
Regards Antoine. Le 19/05/2020 à 15:38, Ryan Murray a écrit : > Thanks for the clarification! Next time I will read the whole document ;-) > > On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou <anto...@python.org> wrote: > >> >> As explained in the comment below: >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 >> >> Regards >> >> Antoine. >> >> >> Le 19/05/2020 à 14:14, Ryan Murray a écrit : >>> Thanks Antoine, >>> >>> Can you just clarify what you mean by 'type ids are logical'? In my mind >>> type ids are strongly coupled to the types and their order in Schema.fbs >>> [1]. Do you mean that the order there is only a convention and we can't >>> assume that 0 === Null? >>> >>> Best, >>> Ryan >>> >>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 >>> >>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org> >> wrote: >>> >>>> >>>> Le 19/05/2020 à 13:43, Ryan Murray a écrit : >>>>> Hey All, >>>>> >>>>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I >>>> noticed >>>>> that there is a difference between C++ and Java on the way Sparse >> Unions >>>>> are handled. I haven't seen in the format spec which the correct is so >> I >>>>> wanted to check with the wider community. >>>>> >>>>> c++ (and the integration tests) see sparse unions as: >>>>> name >>>>> count >>>>> VALIDITY[] >>>>> TYPE_ID[] >>>>> children[] >>>>> >>>>> and java as: >>>>> name >>>>> count >>>>> TYPE[] >>>>> children[] >>>>> >>>>> The precise names may only be important for json reading/writing in the >>>>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the >> big >>>>> difference is that Java doesn't have a validity buffer and c++ does. My >>>>> understanding is thta technically the validity buffer is redundant (0 >>>> type >>>>> == NULL) so I can see why Java would omit it. My question is then: >> which >>>>> language is 'correct'? >>>> >>>> Union type ids are logical, so 0 could very well be a valid type id. >>>> You can't assume that type 0 means a null entry. >>>> >>>> Regards >>>> >>>> Antoine. >>>> >>> >> >