jorgecarleitao commented on a change in pull request #12599:
URL: https://github.com/apache/arrow/pull/12599#discussion_r824218929



##########
File path: docs/source/format/Columnar.rst
##########
@@ -208,17 +208,19 @@ right-to-left: ::
               0  0  1  0  1  0  1  1
 
 Arrays having a 0 null count may choose to not allocate the validity
-bitmap. Implementations may choose to always allocate one anyway as a
-matter of convenience, but this should be noted when memory is being
-shared.
+bitmap; how this is represented depends on the implementation (for
+example, a C++ implementation may represent such an "absent" validity
+bitmap using a NULL pointer). Implementations may choose to always allocate
+a validity bitmap anyway as a matter of convenience. Consumers of Arrow
+arrays should be ready to handle those two possibilities.
 
-Nested type arrays except for union types have their own validity bitmap and
-null count regardless of the null count and valid bits of their child arrays.
+Nested type arrays (except for union types as noted above) have their own
+top-level validity bitmap and null count, regardless of the null count and
+valid bits of their child arrays.
 
-Array slots which are null are not required to have a particular
-value; any "masked" memory can have any value and need not be zeroed,
-though implementations frequently choose to zero memory for null
-values.
+Array slots which are null are not required to have a particular value;
+any "masked" memory can have any value and need not be zeroed, though

Review comment:
       sorry, I meant to use the defered pointer for anything. I was thinking 
in terms of leveraging SIMD over null and non-null slots.
   
   An example (in Rust):
   
   ```rust
   // cargo miri run --example example.rs
   
   // example.rs
   fn main() {
       let mut a = Vec::<u8>::with_capacity(4); // similar to aligned_alloc
   
       let mut dst = a.as_mut_ptr(); // `*mut u8`
   
       // create a [10, ?, 10, ?]
       for i in 0..4 {
           if i % 2 == 0 {
               unsafe { dst.write(10) }
           } else {
               // let's skip this branch to "save" a `.write`
           }
           dst = unsafe { dst.add(1) };
       }
       unsafe { a.set_len(4) }    // this is unsound in Rust: we can't "expose" 
uninitialized memory
   
       let data = a.as_slice();
   
       // this is UB, we use un-initialized memory; however, we would benefit 
from being able to perform this op, if e.g.
       // we wanted to perform the +1 over all values in lanes
       data[1] + 1;
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to