jacques-n commented on pull request #7290:
URL: https://github.com/apache/arrow/pull/7290#issuecomment-648507985


   > We can decide to stipulate that union types never have non-valid values at 
the Union cell level, only at the child cell level. But then a union value 
cannot be "made null" by changing the validity bitmap of the Union. 
   
   I believe a union can still express this with a child type null type, no? I 
think that is how we either modeled it or planned to model it no the java side.
   
   > All nested types including union are composed from well-formed child 
arrays which may have null values.
   
   I'm in agreement on this. Decomposing would be complex.
   
   > In the case of union, a null at the top level would indicate that the type 
of the child is not known. This seems algebraically consistent to me.
   
   I think it's where the model breaks down because of the weird situation 
where you actually need to evaluate two validity buffers to determine whether 
something is valid: the parent and the child. And an inconsistency would be 
really weird. As such, I'm think it would be better to avoid the top-level 
validity buffer.
   
   > FTR I'm OK with dropping the top-level validity bitmap from Union, 
especially if it helps us move forward
   
   That would be my preference. It seems to ultimately reduce the risk of 
inconsistency and doesn't seem to have any functional loss (given the use of 
null type to indicate a non-alternatively-typed value). I also think this works 
well in the most common case of union types, e.g. two files where one has 
fieldA with schemaA and another where you have fieldA with schemaB. Compositing 
those two doesn't require some kind of introspection and AND'ing of the 
individual children to build an additional validity buffer (or simply setting 
true for all and then having an inconsistency with the child array) and allows 
a fast set of the type vector for each independent chunk.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to