Hi,

Apologies if I am rehashing something that has already been discussed or is documented elsewhere, but reading the documentation of the Run-Length encoding [1] I noticed that the parent null count can be non-zero [2].

This is somewhat surprising to me for a couple of reasons:

- This is inconsistent with how it is handled for other nested types like dictionaries, structs, etc... where a null count is solely the number of nulls in the mask of that Array - Codepaths that use null counts to infer validity mask properties such as presence, bit counts, etc... will no longer work - This null count can only be recomputed in the context of the run-ends, implying codepaths that slice ArrayData or otherwise manipulate ArrayData directly must be run-length aware

This leads to a couple of questions

- Is this a documentation mistake or is the null count of RunEndEncoded ArrayData determined by its children - Can a RunEndEncoded ArrayData contain a null mask itself, independently of its runs, much like dictionary arrays can

Any clarifications would be most welcome

[1]: https://arrow.apache.org/docs/dev/format/Columnar.html#run-end-encoded-layout
[2]: https://github.com/apache/arrow/pull/13333/files#r1083470362

Reply via email to