UBarney commented on PR #16889:
URL: https://github.com/apache/datafusion/pull/16889#issuecomment-3124214924

   > > I didn't expect this PR to have such a big performance improvement. Like 
#16443, I still don't understand why there is a performance improvement.
   > 
   > I’m aware of two key differences:
   > 
   > * Fewer redundant steps for `indices <--> batches`
   > * Always keeping the right batch in cache (the original implementation 
performs a left-chunk × right-row iteration)
   > 
   > However I was just hoping to cleanup the codebase a bit, I also didn’t 
expect it to be an easy 2X.
   
   
   I can't use perf to analyze cache misses in my Hyper-V VM.
   
   <details>
   
   
   ```
   sudo perf stat -e cycles,instructions,cache-references,cache-misses 
./target/release/1_left_row_join_right_batch -c '        SELECT *               
             FROM range(10000) AS t1
           JOIN range(200000) AS t2
           ON (t1.value + t2.value) % 1000 = 0;'erf list cache -M
   
    Performance counter stats for './target/release/1_left_row_join_right_batch 
-c         SELECT *                            FROM range(10000) AS t1
           JOIN range(200000) AS t2
           ON (t1.value + t2.value) % 1000 = 0;':
   
      <not supported>      cycles
      <not supported>      instructions
      <not supported>      cache-references
      <not supported>      cache-misses
   
          0.662652693 seconds time elapsed
   
         11.216489000 seconds user
          0.082434000 seconds sys
   
   ```
   
   </details>
   
   
   However, using the `time` command, I discovered that the previous version 
had a significantly higher number of `Minor (reclaiming a frame) page faults` 
(3,207,160 vs 5,133) and much greater system time (19.36s vs 0.06s).
   
   
   
   <details>
   
   ```
    /usr/bin/time -v ./target/release/join_limit_join_batch_size -c '        
SELECT *                            FROM range(10000) AS t1
           JOIN range(200000) AS t2
           ON (t1.value + t2.value) % 1000 = 0;'
   
           Command being timed: "./target/release/join_limit_join_batch_size -c 
        SELECT *                            FROM range(10000) AS t1
           JOIN range(200000) AS t2
           ON (t1.value + t2.value) % 1000 = 0;"
           User time (seconds): 27.68
           System time (seconds): 19.36
           Percent of CPU this job got: 2058%
           Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.28
           Average shared text size (kbytes): 0
           Average unshared data size (kbytes): 0
           Average stack size (kbytes): 0
           Average total size (kbytes): 0
           Maximum resident set size (kbytes): 23530920
           Average resident set size (kbytes): 0
           Major (requiring I/O) page faults: 0
           Minor (reclaiming a frame) page faults: 3207160
           Voluntary context switches: 629
           Involuntary context switches: 1771
           Swaps: 0
           File system inputs: 0
           File system outputs: 0
           Socket messages sent: 0
           Socket messages received: 0
           Signals delivered: 0
           Page size (bytes): 4096
           Exit status: 0
   
    /usr/bin/time -v ./target/release/1_left_row_join_right_batch -c '        
SELECT *                            FROM range(10000) AS t1
           JOIN range(200000) AS t2
           ON (t1.value + t2.value) % 1000 = 0;'
   
           Command being timed: "./target/release/1_left_row_join_right_batch 
-c         SELECT *                            FROM range(10000) AS t1
           JOIN range(200000) AS t2
           ON (t1.value + t2.value) % 1000 = 0;"
           User time (seconds): 11.50
           System time (seconds): 0.06
           Percent of CPU this job got: 1896%
           Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.61
           Average shared text size (kbytes): 0
           Average unshared data size (kbytes): 0
           Average stack size (kbytes): 0
           Average total size (kbytes): 0
           Maximum resident set size (kbytes): 135744
           Average resident set size (kbytes): 0
           Major (requiring I/O) page faults: 0
           Minor (reclaiming a frame) page faults: 5133
           Voluntary context switches: 461
           Involuntary context switches: 574
           Swaps: 0
           File system inputs: 0
           File system outputs: 0
           Socket messages sent: 0
           Socket messages received: 0
           Signals delivered: 0
           Page size (bytes): 4096
           Exit status: 0
   ```
   
   </details>
   
   My speculation is that the previous version _**suffered from memory 
management overload**_ due to need alloc large memory, as `perf` also indicated 
that the single kernel function `clear_page_erms`( a kernel function that 
efficiently zeroes out a page of memory using a fast CPU instruction.) was the 
top CPU consumer.
   
   
   ```
   sudo perf report --no-children
   
   Samples: 215K of event 'cpu-clock:ppp', Event count (approx.): 53759750000
     Overhead  Command          Shared Object               Symbol
   +   25.15%  tokio-runtime-w  [kernel.kallsyms]           [k] clear_page_erms
   +    6.90%  tokio-runtime-w  join_limit_join_batch_size  [.] 
0x0000000000df5b44
   +    6.31%  swapper          [kernel.kallsyms]           [k] 
pv_native_safe_halt
   +    4.37%  tokio-runtime-w  join_limit_join_batch_size  [.] 
0x0000000002bd63e5
   +    4.32%  tokio-runtime-w  join_limit_join_batch_size  [.] 
0x0000000002bd63d5
   +    3.81%  tokio-runtime-w  [kernel.kallsyms]           [k] 
_raw_spin_unlock_irqrestore
   +    3.10%  tokio-runtime-w  join_limit_join_batch_size  [.] 
0x0000000002bd61e7
   +    2.99%  tokio-runtime-w  join_limit_join_batch_size  [.] 
0x0000000000e088a4
   +    2.84%  tokio-runtime-w  join_limit_join_batch_size  [.] 
0x0000000002bd61bc
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to