Jefffrey commented on issue #9302: URL: https://github.com/apache/arrow-rs/issues/9302#issuecomment-3822306322
My understanding just from reading the spec would agree that having a child marked as non-nullable but with nullable data makes no sense, since the spec highlights that each child + struct null maps can be interpreted separately: > A struct array has its own validity bitmap that is independent of its child arrays’ validity bitmaps. The validity bitmap for the struct array might indicate a null when one or more of its child arrays has a non-null value in its corresponding slot; or conversely, a child array might indicate a null in its validity bitmap while the struct array’s validity bitmap shows a non-null value. - https://arrow.apache.org/docs/format/Columnar.html#struct-validity And I think that understanding is what drove #3205 for me, but as highlighted in the followup #3244 there are some edge cases which ruin the fun for everyone. > But you can see that in the cpp implementation the struct builder append default value for child arrays: I wonder how this tracks with the reply in the mailing list thread where they state [C++ doesn't validate this either](https://github.com/apache/arrow/pull/46129)? (I'm not familiar with the C++ implementation so I'm just taking things at face value) How do they cater for the edge case regarding placeholder values (e.g. for dictionaries) 🤔 > You can't work on the child arrays in a separate context without the data of the parent I feel this isn't exactly a concern; the child could always have non-null values where the parent validity bitmap states there are null values, so in that way if you operate on child arrays separate from parent data you'd already lose this context, disregarding the mismatch between how parents declare nullability of their children (correct me if I'm wrong). - Another way of saying this is that the field nullability is not directly attached to the child; it's attached to the parent structarray, so if you operate on the child array separate from the parent structarray then you wouldn't have that field (and declared nullability) anymore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
