Sparse Union format

Ryan Murray Tue, 19 May 2020 04:45:18 -0700

Hey All,

While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed
that there is a difference between C++ and Java on the way Sparse Unions
are handled. I haven't seen in the format spec which the correct is so I
wanted to check with the wider community.


c++ (and the integration tests) see sparse unions as:
name
count
VALIDITY[]
TYPE_ID[]
children[]

and java as:
name
count
TYPE[]
children[]

The precise names may only be important for json reading/writing in the
integration tests so I will ignore TYPE/TYPE_ID for now. However, the big
difference is that Java doesn't have a validity buffer and c++ does. My
understanding is thta technically the validity buffer is redundant (0 type
== NULL) so I can see why Java would omit it. My question is then: which
language is 'correct'?

I suppose the actual language implementation is not entirely relevant here,
instead correct refers to what the canonical IPC schema for a sparse union
should be.

Best,
Ryan

Sparse Union format

Reply via email to