Re: [I] Fields within a null struct are not initialized with null values [arrow]

via GitHub Fri, 31 May 2024 02:24:27 -0700


jorisvandenbossche commented on issue #41833:
URL: https://github.com/apache/arrow/issues/41833#issuecomment-2141586788


   @timsaucer I see what you mean, but as far as I know, nothing in the Arrow 
columnar format specification requires that those values are null.
   
   In the end, also for a primitive array with a null, we actually put some 
"default" value in the null slot:
   
   
   ```python
   >>> arr = pa.array([1, None, 3])
   >>> arr
   <pyarrow.lib.Int64Array object at 0x7f23782d5360>
   [
     1,
     null,
     3
   ]
   
   # using nanoarrow to more easily view the actual buffers
   >>> import nanoarrow as na
   >>> na.array(arr).inspect()
   <ArrowArray int64>
   - length: 3
   - offset: 0
   - null_count: 1
   - buffers[2]:
     - validity <bool[1 b] 10100000>
     - data <int64[24 b] 1 0 3>    # <-- looking at the actual data buffer, the 
null slot is also filled with 0
   - dictionary: NULL
   - children[0]:
   ```
   
   Similarly, in the nested struct case, those default values in the child 
array are masked by the validity of the parent struct array. 
   I know it is not exactly the same given I am comparing a buffer with a child 
array, but the principle is the same: the null is determined by the validity 
bitmap, and that at point the underlying value (whether this is a buffer slot 
or a child arrays slot) can be any value.
   
   While you could argue that for specifically this kind of conversion of 
python objects to Arrow data, we _could_ put a null in the child array as well 
(although that would require to allocate an additional validity bitmap in this 
small example case), other code should never assume this is the case, as you 
can easily create a StructArray in a different way (eg directly from the child 
arrays and a validity bitmap) that would also not give this guarantee.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Fields within a null struct are not initialized with null values [arrow]

Reply via email to