I was just looking into a related issue last night where it seems pandas complains if there are _any_ nulls in the dictionary and we were considering not allowing nulls in the dictionary values at all. But it's a little tangled up at the moment because we've already allowed it. Ref: https://github.com/apache/arrow-julia/issues/360
-Jacob On Tue, Nov 29, 2022 at 8:06 AM Raphael Taylor-Davies <r.taylordav...@googlemail.com.invalid> wrote: > Hi All, > > I am not sure if it is intentional, but a common property of all arrow > layouts is that the value at a given index is defined, even if for a > null it may contain an arbitrary value. This is true everywhere except > for the dictionary layout, where the key in the null slot may contain an > arbitrary value, and consequently the value of the index is undefined. > > This has been a repeated nuisance in the Rust implementation, but so far > I've managed to find workarounds for most issues, however, I'm unsure > how to handle StructArrays containing non-nullable, dictionary-encoded > children. As the children are non-nullable, they cannot contain a null > mask, but without a null mask the child dictionary array is ill-formed. > I'm not really sure how best to handle this? > > One option might be to require that all dictionary keys, even those for > null slots, are a valid index into the child values array. As the child > values array can itself contain nulls, this is always possible. > > My questions are therefore: > > * How are other implementations handling this case? > > * Is requiring all dictionary keys to be a valid index into the child > values acceptable? We already do something similar for offsets > > * What is the motivation for dictionaries having two levels of > nullability, both in the keys and values. UnionArray by contrast only > encodes nullability in its children > > Any help would be much appreciated > > Kind Regards, > > Raphael Taylor-Davies > >