I was just looking into a related issue last night where it seems pandas
complains if there are _any_ nulls in the dictionary and we were
considering not allowing nulls in the dictionary values at all. But it's a
little tangled up at the moment because we've already allowed it. Ref:
https://github.com/apache/arrow-julia/issues/360

-Jacob

On Tue, Nov 29, 2022 at 8:06 AM Raphael Taylor-Davies
<r.taylordav...@googlemail.com.invalid> wrote:

> Hi All,
>
> I am not sure if it is intentional, but a common property of all arrow
> layouts is that the value at a given index is defined, even if for a
> null it may contain an arbitrary value. This is true everywhere except
> for the dictionary layout, where the key in the null slot may contain an
> arbitrary value, and consequently the value of the index is undefined.
>
> This has been a repeated nuisance in the Rust implementation, but so far
> I've managed to find workarounds for most issues, however, I'm unsure
> how to handle StructArrays containing non-nullable, dictionary-encoded
> children. As the children are non-nullable, they cannot contain a null
> mask, but without a null mask the child dictionary array is ill-formed.
> I'm not really sure how best to handle this?
>
> One option might be to require that all dictionary keys, even those for
> null slots, are a valid index into the child values array. As the child
> values array can itself contain nulls, this is always possible.
>
> My questions are therefore:
>
> * How are other implementations handling this case?
>
> * Is requiring all dictionary keys to be a valid index into the child
> values acceptable? We already do something similar for offsets
>
> * What is the motivation for dictionaries having two levels of
> nullability, both in the keys and values. UnionArray by contrast only
> encodes nullability in its children
>
> Any help would be much appreciated
>
> Kind Regards,
>
> Raphael Taylor-Davies
>
>

Reply via email to