Hi All,

I am not sure if it is intentional, but a common property of all arrow layouts is that the value at a given index is defined, even if for a null it may contain an arbitrary value. This is true everywhere except for the dictionary layout, where the key in the null slot may contain an arbitrary value, and consequently the value of the index is undefined.

This has been a repeated nuisance in the Rust implementation, but so far I've managed to find workarounds for most issues, however, I'm unsure how to handle StructArrays containing non-nullable, dictionary-encoded children. As the children are non-nullable, they cannot contain a null mask, but without a null mask the child dictionary array is ill-formed. I'm not really sure how best to handle this?

One option might be to require that all dictionary keys, even those for null slots, are a valid index into the child values array. As the child values array can itself contain nulls, this is always possible.

My questions are therefore:

* How are other implementations handling this case?

* Is requiring all dictionary keys to be a valid index into the child values acceptable? We already do something similar for offsets

* What is the motivation for dictionaries having two levels of nullability, both in the keys and values. UnionArray by contrast only encodes nullability in its children

Any help would be much appreciated

Kind Regards,

Raphael Taylor-Davies

Reply via email to