Dandandan commented on code in PR #1034:
URL: https://github.com/apache/datafusion-comet/pull/1034#discussion_r1825003648


##########
native/core/src/execution/shuffle/row.rs:
##########
@@ -235,6 +250,143 @@ impl SparkUnsafeRow {
         }
     }
 
+    #[allow(clippy::needless_range_loop)]
+    pub fn get_rows_from_arrays(
+        schema: Vec<ArrowDataType>,
+        arrays: Vec<ArrayRef>,
+        num_rows: usize,
+        num_cols: usize,
+        addr: usize,
+    ) {
+        let mut row_start_addr: usize = addr;
+        for i in 0..num_rows {
+            let mut row = SparkUnsafeRow::new(&schema);
+            let row_size = SparkUnsafeRow::get_row_bitset_width(schema.len()) 
+ 8 * num_cols;
+            unsafe {
+                row.point_to_slice(std::slice::from_raw_parts(
+                    row_start_addr as *const u8,
+                    row_size,
+                ));
+            }
+            row_start_addr += row_size;
+            for j in 0..num_cols {
+                let arr = arrays.get(j).unwrap();
+                let dt = &schema[j];
+                // assert_eq!(dt, arr.data_type());
+                match dt {

Review Comment:
   Some ideas you may want to try:
   1. I think you could move the dynamica dispatch outside of the loop either 
with generics or macro (you want to generate the full inner loop on rows per 
data type). See 
   https://arrow.apache.org/rust/arrow/array/trait.ArrayAccessor.html
   for an example of how to use ArrayAccessor, you probably want to create a 
new function with a generic `T` with a bound on ArrayAccessor for the inner 
loop on all rows (so the dispatch is per column rather than per row).
   2. You could also avoid `is_null(i)` for array with `null_count` of 0 (if 
you do the check outside of the inner loop and write/generate two versions). 
This helps avoiding the check and often also will help generating better code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to