Hi All,
I am not sure if it is intentional, but a common property of all arrow
layouts is that the value at a given index is defined, even if for a
null it may contain an arbitrary value. This is true everywhere except
for the dictionary layout, where the key in the null slot may contain an
arbitrary value, and consequently the value of the index is undefined.
This has been a repeated nuisance in the Rust implementation, but so far
I've managed to find workarounds for most issues, however, I'm unsure
how to handle StructArrays containing non-nullable, dictionary-encoded
children. As the children are non-nullable, they cannot contain a null
mask, but without a null mask the child dictionary array is ill-formed.
I'm not really sure how best to handle this?
One option might be to require that all dictionary keys, even those for
null slots, are a valid index into the child values array. As the child
values array can itself contain nulls, this is always possible.
My questions are therefore:
* How are other implementations handling this case?
* Is requiring all dictionary keys to be a valid index into the child
values acceptable? We already do something similar for offsets
* What is the motivation for dictionaries having two levels of
nullability, both in the keys and values. UnionArray by contrast only
encodes nullability in its children
Any help would be much appreciated
Kind Regards,
Raphael Taylor-Davies