[PR] perf: optimize struct field processing with field-major order [datafusion-comet]

via GitHub Tue, 20 Jan 2026 11:28:21 -0800


andygrove opened a new pull request, #3224:
URL: https://github.com/apache/datafusion-comet/pull/3224


   ## Summary
   
   Optimizes struct field processing in native shuffle by using field-major 
instead of row-major order. This moves type dispatch from O(rows × fields) to 
O(fields), eliminating per-row type matching overhead.
   
   **The problem:**
   Previously, for each row we iterated over all fields and called 
`append_field()` which did a type match for EVERY field in EVERY row. For a 
struct with N fields and M rows, that's N×M type matches where the types never 
change.
   
   ```rust
   // Old approach - row-major order
   for row in rows {                           // M rows
       for (idx, field) in fields.iter() {     // N fields
           append_field(field.data_type(), ...);  // Type match happens here
       }
   }
   // Total: M × N type matches
   ```
   
   **The solution:**
   Field-major processing with two passes:
   1. First pass: Loop over rows, build struct validity
   2. Second pass: For each field, get typed builder once, then process all 
rows for that field
   
   ```rust
   // New approach - field-major order
   // Pass 1: Build struct validity
   for row in rows {
       struct_builder.append(is_valid);
   }
   
   // Pass 2: Process fields
   for (field_idx, field) in fields.iter() {   // N fields
       match field.data_type() {                // Type match ONCE per field
           DataType::Int32 => {
               let builder = 
struct_builder.field_builder::<Int32Builder>(field_idx);
               for row in rows {                // M rows
                   builder.append_value(...);   // No type match
               }
           }
           // ... other types
       }
   }
   // Total: N type matches
   ```
   
   This reduces type dispatch from O(rows × fields) to O(fields).
   
   For complex nested types (struct, list, map), falls back to existing 
`append_field` since they have their own recursive processing logic.
   
   ## Test plan
   
   - [x] All Rust tests pass (115 tests)
   - [x] Native shuffle tests pass (16 tests)
   - [x] Fuzz tests pass (120 tests)
   - [x] Clippy clean
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: optimize struct field processing with field-major order [datafusion-comet]

Reply via email to