Also, you may want to run the integration tests and inspect the
generated JSON file for union data, it will probably be informative
(look for type ids).

Regards

Antoine.


Le 19/05/2020 à 15:38, Ryan Murray a écrit :
> Thanks for the clarification! Next time I will read the whole document ;-)
> 
> On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou <anto...@python.org> wrote:
> 
>>
>> As explained in the comment below:
>> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 19/05/2020 à 14:14, Ryan Murray a écrit :
>>> Thanks Antoine,
>>>
>>> Can you just clarify what you mean by 'type ids are logical'? In my mind
>>> type ids are strongly coupled to the types and their order in Schema.fbs
>>> [1]. Do you mean that the order there is only a convention and we can't
>>> assume that 0 === Null?
>>>
>>> Best,
>>> Ryan
>>>
>>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
>>>
>>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org>
>> wrote:
>>>
>>>>
>>>> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
>>>>> Hey All,
>>>>>
>>>>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I
>>>> noticed
>>>>> that there is a difference between C++ and Java on the way Sparse
>> Unions
>>>>> are handled. I haven't seen in the format spec which the correct is so
>> I
>>>>> wanted to check with the wider community.
>>>>>
>>>>> c++ (and the integration tests) see sparse unions as:
>>>>> name
>>>>> count
>>>>> VALIDITY[]
>>>>> TYPE_ID[]
>>>>> children[]
>>>>>
>>>>> and java as:
>>>>> name
>>>>> count
>>>>> TYPE[]
>>>>> children[]
>>>>>
>>>>> The precise names may only be important for json reading/writing in the
>>>>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the
>> big
>>>>> difference is that Java doesn't have a validity buffer and c++ does. My
>>>>> understanding is thta technically the validity buffer is redundant (0
>>>> type
>>>>> == NULL) so I can see why Java would omit it. My question is then:
>> which
>>>>> language is 'correct'?
>>>>
>>>> Union type ids are logical, so 0 could very well be a valid type id.
>>>> You can't assume that type 0 means a null entry.
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>>>
>>
> 

Reply via email to