Hi Jorge,
Le 13/02/2021 à 04:56, Jorge Cardoso Leitão a écrit :
>
> One solution is to assume an offset of zero when reading from IPC. But afai
> understand, in that case, producers must themselves only share bitmap
> buffers that are aligned at "8 bit boundaries". For example, an array with
> offset 3, len 12 and a (shared) validity buffer with
>
> 01101010, 01101010
>
> can't just write the above to the message; it must write the "new" below:
>
> new: (010){01101}, 0000[1101]
> old: {01101}010, 0[1101](010) # 12 + 3 = 15, unbracket bits are ignored
>
> i.e. skip the first 3 bits from the first byte and shift all bits
> accordingly.
>
> Is this reasoning correct? Is this the intention?
This is right. You'll see here the implementation in the C++ IPC
writer, where non-byte aligned bitmaps are being copied to a temporary
buffer:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L84-L99
(note this code is a bit suboptimal, it could avoid copying if the
offset is a multiple of 8)
This must be done for the data of boolean arrays as well:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L301-L307
Regards
Antoine.