Hello all,

Does anybody else want to give an opinion on this?

Thank you

Antoine.


On Tue, 17 Nov 2020 12:28:06 +0100
Antoine Pitrou <anto...@python.org> wrote:
> Hello,
> 
> The format spec and the C++ implementation disagree on one point:
> 
> * The spec says that dense union offsets should be increasing:
> """The respective offsets for each child value array must be in order /
> increasing."""
> 
> (from https://arrow.apache.org/docs/format/Columnar.html#dense-union)
> 
> * The C++ implementation has long had some tests that used deliberatly
> non-increasing (even descending) dense union offsets.
> 
> (see https://issues.apache.org/jira/browse/ARROW-10580)
> 
> I don't know what other implementations, especially Java, expect.
> 
> There are obviously two possible solutions:
> 
> 1) Fix the C++ implementation and its tests to conform to the format
> spec (which may break compatibility for code producing / consuming dense
> unions with non-increasing offsets)
> 
> 2) Relax the format spec to allow arbitrary offsets (which could make
> dense union more like a polymorphic dictionary).
> 
> If the first solution is chosen, then another question arises: must the
> offsets be strictly increasing?  Or can a given offset appear several
> times in a row?
> (the latter is currently exploited by the C++ implementation: when
> appending several nulls to a DenseUnionBuilder, only one child null slot
> is added and the same offset is appended multiple times)
> 
> Regards
> 
> Antoine.
> 



Reply via email to