[I] Explore decomposed branchless loop WKB parsing [sedona-db]

via GitHub Sat, 18 Oct 2025 10:46:23 -0700


paleolimbot opened a new issue, #201:
URL: https://github.com/apache/sedona-db/issues/201


   The `WkbHeader` will be a great optimization for a number of things we'd 
like to do...another option worth exploring might be
   Another technique potentially worth exploring to optimize these very cheap 
functions is decomposing the parsing into a series of branchless loops. The 
compiler has more options to optimize a loop that contains absolutely zero `if` 
`match` or `||/&&`s (e.g. SIMD autovectorization). I've prototyped this in C++ 
before and found order-of-magnitude speed improvements for Binary arrays 
(probably less helpful for BinaryView arrays, where there is always an if 
statement to access the data).
   
   ```rust
   let mut n_valid_size = 0;
   for item in not_null_array {
     n_valid_size += item.len() >= 5;
   }
   
   if n_valid_size != not_null_array.len() {
     return do_the_slow_version();
   }
   
   let mut n_little_endian = 0;
   for item in not_null_array {
     n_little_endian += item[0] == 0x01;
   }
   
   if n_little_endian != not_null_array.len() {
     return do_the_slow_version();
   }
   
   let mut geometry_types: u32 = 0;
   let mut geometry_type_bytes: [u8; 4];
   for item in not_null_array {
     geometry_type_bytes.copy_from_slice(&item[1..5]);
     let geometry_type_id = u32::from_be_bytes(&geometry_type_bytes);
     geometry_types |= 1_u32 << geometry_type_id & 0x07;
   }
   
   // Potentially do something faster if all we have are points
   ```
   
   There comes a point where the multiple branchless loops become slower than a 
single loop with branching...I haven't experimented with this enough to know 
where that point is.
   
   For functions that only operate on points, there is also a cool optimization 
you can do for Binary (not BinaryView) arrays: because the data for arrays with 
zero nulls are all lined up consecutively in memory (i.e., one WKB item after 
another all in the same buffer), you can loop through X values like this (once 
you've validated the input using the above):
   
   ```rust
   let mut offset = first_x_offset;
   let mut x_bytes: [u8; 8];
   for i in 0..array.len() {
     x_bytes.copy_from_slice(data_buffer[offset..(offset + 8)]);
     let x = f64::from_le_bytes(x_bytes);
     offset += 21;
   }
   ```
   
   My initial explorations of this are here: 
https://github.com/geoarrow/geoarrow-c/blob/28eca0fea6f47c70113dc1719e7597e53bede461/dev/benchmarks/c/wkb_bounding_benchmark.cc#L408-L450
 . Specifically for the bounding operation, the initial numbers suggested that 
this was able to trigger simd for at least the bounding operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Explore decomposed branchless loop WKB parsing [sedona-db]

Reply via email to