cyb70289 commented on issue #6986:
URL: https://github.com/apache/arrow/pull/6986#issuecomment-616978252


   This change introduces severe branch misses in certain conditions. See perf 
logs below. I changed benchmark code to run only the problematic test case.
   
   Without this patch
   ```bash
           807.415826      task-clock (msec)         #    0.979 CPUs utilized   
       
                   83      context-switches          #    0.103 K/sec           
       
                    0      cpu-migrations            #    0.000 K/sec           
       
                  427      page-faults               #    0.529 K/sec           
       
        2,285,801,407      cycles                    #    2.831 GHz             
         (83.17%)
            2,313,785      stalled-cycles-frontend   #    0.10% frontend cycles 
idle     (83.16%)
          915,631,177      stalled-cycles-backend    #   40.06% backend cycles 
idle      (82.93%)
        9,997,208,858      instructions              #    4.37  insn per cycle  
       
                                                     #    0.09  stalled cycles 
per insn  (83.66%)
        1,679,799,451      branches                  # 2080.464 M/sec           
         (83.66%)
              106,599      branch-misses             #    0.01% of all branches 
         (83.41%)
   ```
   
   With this patch
   ```bash
           902.557236      task-clock (msec)         #    0.980 CPUs utilized   
       
                   94      context-switches          #    0.104 K/sec           
       
                    0      cpu-migrations            #    0.000 K/sec           
       
                  427      page-faults               #    0.473 K/sec           
       
        2,567,879,767      cycles                    #    2.845 GHz             
         (83.17%)
           88,266,680      stalled-cycles-frontend   #    3.44% frontend cycles 
idle     (83.17%)
           20,826,862      stalled-cycles-backend    #    0.81% backend cycles 
idle      (83.03%)
        2,518,949,193      instructions              #    0.98  insn per cycle  
       
                                                     #    0.04  stalled cycles 
per insn  (83.62%)
          847,459,928      branches                  #  938.954 M/sec           
         (83.61%)
           75,187,208      branch-misses             #    8.87% of all branches 
         (83.39%)
   ```
   Absolute counts are not comparable as gtest runs different loops for each 
test.
   The point is branch-misses jumps from 0.01% to 8.87%, which causes high 
frontend stall(cpu wait for fetching code to execute), and ipc(instructions per 
cycle) drops from 4.37 to 0.98.
   
   I didn't figure out which branch is miss predicted and why. My haswell cpu 
is too old to support branch tracing. My guess is [this 
line](https://github.com/apache/arrow/blob/5093b809d63ac8db99aec9caa7ad7e723f277c46/cpp/src/arrow/util/bit_util.cc#L285),
 no concrete justification.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to