Dandandan commented on code in PR #1034: URL: https://github.com/apache/datafusion-comet/pull/1034#discussion_r1825003648
########## native/core/src/execution/shuffle/row.rs: ########## @@ -235,6 +250,143 @@ impl SparkUnsafeRow { } } + #[allow(clippy::needless_range_loop)] + pub fn get_rows_from_arrays( + schema: Vec<ArrowDataType>, + arrays: Vec<ArrayRef>, + num_rows: usize, + num_cols: usize, + addr: usize, + ) { + let mut row_start_addr: usize = addr; + for i in 0..num_rows { + let mut row = SparkUnsafeRow::new(&schema); + let row_size = SparkUnsafeRow::get_row_bitset_width(schema.len()) + 8 * num_cols; + unsafe { + row.point_to_slice(std::slice::from_raw_parts( + row_start_addr as *const u8, + row_size, + )); + } + row_start_addr += row_size; + for j in 0..num_cols { + let arr = arrays.get(j).unwrap(); + let dt = &schema[j]; + // assert_eq!(dt, arr.data_type()); + match dt { Review Comment: Some ideas you may want to try: 1. I think you could move the dynamica dispatch outside of the loop either with generics or macro (you want to generate the full inner loop on rows per data type). See https://arrow.apache.org/rust/arrow/array/trait.ArrayAccessor.html for an example of how to use ArrayAccessor, you probably want to create a new function with a generic `T` with a bound on ArrayAccessor for the inner loop on all rows (so the dispatch is per column rather than per row). 2. You could also avoid `is_null(i)` for array with `null_count` of 0 (if you do the check outside of the inner loop and write/generate two versions). This helps avoiding the check and often also will help generating better code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org