For an empty (list, binary, string) array, what should the offsets buffer be? 
Empty buffer or a buffer containing a single zero? Or both are valid?

There is some related information I found:

  1.  In the Apache Arrow Format: 
link<https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout>

The offsets buffer contains length + 1 signed integers (either 32-bit or 
64-bit, depending on the logical type).
Generally the first slot in the offsets array is 0, and the last slot is the 
length of the values array.

  1.  A related issue in arrow-rs: 
link<https://github.com/apache/arrow-rs/issues/1620>

We find that some test data in arrow-testing has empty offsets buffer (but not 
100% sure).

  1.  In arrow2 (rust), offsets buffer cannot be empty:

Link1<https://github.com/jorgecarleitao/arrow2/blob/8fb3b8d3f05cdc3d51f1314cfeb9bec39196789c/src/array/specification.rs#L101>

Link2<https://github.com/jorgecarleitao/arrow2/blob/main/src/io/ipc/read/array/binary.rs#L45>

  1.  In arrow (c++), (sorry I am not familiar with the c++ implementation):

Link<https://github.com/apache/arrow/blob/c70426f73326b3852d1bd7c31d98be4743f3fcba/cpp/src/arrow/array/array_nested.cc#L111-L113>

Looking forward to your opintions!


Regards,
Remzi

Reply via email to