mbutrovich commented on PR #20452:
URL: https://github.com/apache/datafusion/pull/20452#issuecomment-3936165730

   Thanks for working on SMJ performance @andygrove!                            
                                                                                
                                                                                
                                                
                                                                                
                                                                                
                                                                                
                                         
   One suggestion: could `JoinComparator` be built on top of 
`arrow_ord::ord::make_comparator` rather than reimplementing type dispatch?     
                                                                                
                                                            
                                                                                
                                                                                
                                                                                
                                         
   ```rust                                                                      
                                                                                
                                                                                
                                         
   // already in DataFusion's dependency tree (arrow-ord)                       
                                                                                
                                                                                
                                         
   pub type DynComparator = Box<dyn Fn(usize, usize) -> Ordering + Send + Sync>;
   make_comparator(left: &dyn Array, right: &dyn Array, opts: SortOptions) -> 
Result<DynComparator>
   ```
   It already does everything this PR implements, and more:
   - Resolves type dispatch once at construction time
   - Bakes `SortOptions` into the closure, so the hot path is just `cmp(i, j)`
   - If neither array has nulls, returns a closure with no null checks at all — 
the current code and proposed `JoinComparator` both check nulls on every row
   - Covers more types than the handwritten match: `List`, `Struct`, `Map`, 
`RunEndEncoded`, `Dictionary`, `ByteView` with optimized prefix comparison
   
   For `compare()` it's a direct replacement. For `is_equal()` 
(null-equals-null semantics), you'd wrap it:
   
   ```rust
   let cmp = make_comparator(left_arr, right_arr, opts)?;
   let is_eq = move |i: usize, j: usize| -> bool {
       match (left_arr.is_null(i), right_arr.is_null(j)) {
           (true, true) => true,
           (true, false) | (false, true) => false,
           _ => cmp(i, j).is_eq(),
       }
   };
   ```
   Side note: `compare_join_arrays` in joins/utils.rs has the same handwritten 
match and is also used in`piecewise_merge_join` — could be cleaned up in the 
same pass or as a follow-on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to