Jefffrey commented on issue #9302:
URL: https://github.com/apache/arrow-rs/issues/9302#issuecomment-3822306322

   My understanding just from reading the spec would agree that having a child 
marked as non-nullable but with nullable data makes no sense, since the spec 
highlights that each child + struct null maps can be interpreted separately:
   
   > A struct array has its own validity bitmap that is independent of its 
child arrays’ validity bitmaps. The validity bitmap for the struct array might 
indicate a null when one or more of its child arrays has a non-null value in 
its corresponding slot; or conversely, a child array might indicate a null in 
its validity bitmap while the struct array’s validity bitmap shows a non-null 
value.
   
   - https://arrow.apache.org/docs/format/Columnar.html#struct-validity
   
   And I think that understanding is what drove #3205 for me, but as 
highlighted in the followup #3244 there are some edge cases which ruin the fun 
for everyone.
   
   > But you can see that in the cpp implementation the struct builder append 
default value for child arrays:
   
   I wonder how this tracks with the reply in the mailing list thread where 
they state [C++ doesn't validate this 
either](https://github.com/apache/arrow/pull/46129)? (I'm not familiar with the 
C++ implementation so I'm just taking things at face value)
   
   How do they cater for the edge case regarding placeholder values (e.g. for 
dictionaries) 🤔
   
   > You can't work on the child arrays in a separate context without the data 
of the parent
   
   I feel this isn't exactly a concern; the child could always have non-null 
values where the parent validity bitmap states there are null values, so in 
that way if you operate on child arrays separate from parent data you'd already 
lose this context, disregarding the mismatch between how parents declare 
nullability of their children (correct me if I'm wrong).
   
   - Another way of saying this is that the field nullability is not directly 
attached to the child; it's attached to the parent structarray, so if you 
operate on the child array separate from the parent structarray then you 
wouldn't have that field (and declared nullability) anymore


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to