andygrove commented on PR #1034:
URL: 
https://github.com/apache/datafusion-comet/pull/1034#issuecomment-2455839248

   @parthchandra I experimented with iterating over columns first and then 
rows. I also modified the code to only create `SparkUnsafeRow` once instead of 
once per row. I now see similar performance between Spark and native.
   
   ```
   OpenJDK 64-Bit Server VM 11.0.24+8-post-Ubuntu-1ubuntu322.04 on Linux 
6.8.0-47-generic
   AMD Ryzen 9 7950X3D 16-Core Processor
   ColumnarToRowExec:                        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Spark Columnar To Row - integer                      57             62       
    4        183.9           5.4       1.0X
   Comet Columnar To Row - integer                      57             73       
   17        183.8           5.4       1.0X
   ```
   
   Code:
   
   ```rust
                   ArrowDataType::Int32 => {
                       let array = arr
                           .as_any()
                           .downcast_ref::<Int32Array>()
                           .expect("Error downcasting to Int32");
   
                       let mut row = SparkUnsafeRow::new(&schema);
                       let mut row_start_addr: usize = addr;
   
                       if array.null_count() == 0 {
                           for i in 0..num_rows {
                               let row_size =
                                   
SparkUnsafeRow::get_row_bitset_width(schema.len()) + 8 * num_cols;
                               unsafe {
                                   
row.point_to_slice(std::slice::from_raw_parts(
                                       row_start_addr as *const u8,
                                       row_size,
                                   ));
                               }
                               row.set_int(j, array.value(i));
                               row_start_addr += row_size;
                           }
   
                       } else {
                           for i in 0..num_rows {
                               let row_size =
                                   
SparkUnsafeRow::get_row_bitset_width(schema.len()) + 8 * num_cols;
                               unsafe {
                                   
row.point_to_slice(std::slice::from_raw_parts(
                                       row_start_addr as *const u8,
                                       row_size,
                                   ));
                               }
                               if array.is_null(i) {
                                   row.set_null_at(j);
                               } else {
                                   row.set_int(j, array.value(i));
                               }
                               row_start_addr += row_size;
                           }
                       }
                   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to