andygrove commented on PR #1034: URL: https://github.com/apache/datafusion-comet/pull/1034#issuecomment-2455839248
@parthchandra I experimented with iterating over columns first and then rows. I also modified the code to only create `SparkUnsafeRow` once instead of once per row. I now see similar performance between Spark and native. ``` OpenJDK 64-Bit Server VM 11.0.24+8-post-Ubuntu-1ubuntu322.04 on Linux 6.8.0-47-generic AMD Ryzen 9 7950X3D 16-Core Processor ColumnarToRowExec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Spark Columnar To Row - integer 57 62 4 183.9 5.4 1.0X Comet Columnar To Row - integer 57 73 17 183.8 5.4 1.0X ``` Code: ```rust ArrowDataType::Int32 => { let array = arr .as_any() .downcast_ref::<Int32Array>() .expect("Error downcasting to Int32"); let mut row = SparkUnsafeRow::new(&schema); let mut row_start_addr: usize = addr; if array.null_count() == 0 { for i in 0..num_rows { let row_size = SparkUnsafeRow::get_row_bitset_width(schema.len()) + 8 * num_cols; unsafe { row.point_to_slice(std::slice::from_raw_parts( row_start_addr as *const u8, row_size, )); } row.set_int(j, array.value(i)); row_start_addr += row_size; } } else { for i in 0..num_rows { let row_size = SparkUnsafeRow::get_row_bitset_width(schema.len()) + 8 * num_cols; unsafe { row.point_to_slice(std::slice::from_raw_parts( row_start_addr as *const u8, row_size, )); } if array.is_null(i) { row.set_null_at(j); } else { row.set_int(j, array.value(i)); } row_start_addr += row_size; } } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org