Re: Sparse Union format

Antoine Pitrou Tue, 19 May 2020 05:39:19 -0700


As explained in the comment below:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91


Regards

Antoine.


Le 19/05/2020 à 14:14, Ryan Murray a écrit :
> Thanks Antoine,
> 
> Can you just clarify what you mean by 'type ids are logical'? In my mind
> type ids are strongly coupled to the types and their order in Schema.fbs
> [1]. Do you mean that the order there is only a convention and we can't
> assume that 0 === Null?
> 
> Best,
> Ryan
> 
> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
> 
> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org> wrote:
> 
>>
>> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
>>> Hey All,
>>>
>>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I
>> noticed
>>> that there is a difference between C++ and Java on the way Sparse Unions
>>> are handled. I haven't seen in the format spec which the correct is so I
>>> wanted to check with the wider community.
>>>
>>> c++ (and the integration tests) see sparse unions as:
>>> name
>>> count
>>> VALIDITY[]
>>> TYPE_ID[]
>>> children[]
>>>
>>> and java as:
>>> name
>>> count
>>> TYPE[]
>>> children[]
>>>
>>> The precise names may only be important for json reading/writing in the
>>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
>>> difference is that Java doesn't have a validity buffer and c++ does. My
>>> understanding is thta technically the validity buffer is redundant (0
>> type
>>> == NULL) so I can see why Java would omit it. My question is then: which
>>> language is 'correct'?
>>
>> Union type ids are logical, so 0 could very well be a valid type id.
>> You can't assume that type 0 means a null entry.
>>
>> Regards
>>
>> Antoine.
>>
>

Re: Sparse Union format

Reply via email to