Baunsgaard commented on pull request #1127:
URL: https://github.com/apache/systemds/pull/1127#issuecomment-748005848


   When looking at before and after (the way i tested it was dropping the 
transpose commit from the history.) it looks like i might have done something 
wrong in the initial tests. That said, it does not look like the changes had 
any impact, but it did make me notice the difference between executions on the 
wide transpose is large. Sometimes it takes 5 seconds sometimes 2.5 I'm 
guessing it has to do with the two NUMA nodes?
   
   The Full transpose micro benchmark:
   
   After change Alpha
   ```code
   scripts/perftest/results/transpose-skinny-1.0.log
   Total elapsed time:          5.177 sec.
    1  r'             2.567      5
   Total elapsed time:          5.592 sec.
    1  r'             2.487      5
   Total elapsed time:          5.394 sec.
    2  r'             2.393      5
   Total elapsed time:          5.607 sec.
    1  r'             2.496      5
   Total elapsed time:          5.361 sec.
    1  r'             2.531      5
            195735.81 msec task-clock                #   31.188 CPUs utilized   
         ( +-  3.50% )
         595845281584      cycles                    #    3.044 GHz             
         ( +-  3.34% )  (30.75%)
          67405027834      instructions              #    0.11  insn per cycle  
         ( +-  2.26% )  (38.51%)
   scripts/perftest/results/transpose-wide-1.0.log
   Total elapsed time:          4.870 sec.
    1  r'             2.439      5
   Total elapsed time:          5.466 sec.
    1  r'             2.418      5
   Total elapsed time:          5.381 sec.
    1  r'             2.393      5
   Total elapsed time:          5.257 sec.
    1  r'             2.343      5
   Total elapsed time:          4.880 sec.
    1  r'             2.453      5
            197370.59 msec task-clock                #   32.701 CPUs utilized   
         ( +-  6.74% )
         598434626116      cycles                    #    3.032 GHz             
         ( +-  6.70% )  (30.76%)
          70128163005      instructions              #    0.12  insn per cycle  
         ( +-  1.65% )  (38.51%)
   scripts/perftest/results/transpose-full-1.0.log
   Total elapsed time:          3.736 sec.
    2  r'             1.343      5
   Total elapsed time:          3.858 sec.
    2  r'             1.326      5
   Total elapsed time:          3.500 sec.
    2  r'             1.299      5
   Total elapsed time:          3.894 sec.
    2  r'             1.305      5
   Total elapsed time:          3.526 sec.
    2  r'             1.304      5
            104490.76 msec task-clock                #   22.819 CPUs utilized   
         ( +-  1.56% )
         320478636150      cycles                    #    3.067 GHz             
         ( +-  1.69% )  (30.80%)
          62146562879      instructions              #    0.19  insn per cycle  
         ( +-  1.59% )  (38.55%)
   scripts/perftest/results/transpose-skinny-0.1.log
   Total elapsed time:          2.701 sec.
    1  r'             1.437      5
   Total elapsed time:          2.659 sec.
    1  r'             1.141      5
   Total elapsed time:          3.174 sec.
    1  r'             1.761      5
   Total elapsed time:          2.705 sec.
    1  r'             1.103      5
   Total elapsed time:          3.112 sec.
    1  r'             1.472      5
            152922.25 msec task-clock                #   43.917 CPUs utilized   
         ( +-  5.32% )
         473697710114      cycles                    #    3.098 GHz             
         ( +-  5.32% )  (31.11%)
          75871932728      instructions              #    0.16  insn per cycle  
         ( +-  2.13% )  (38.92%)
   scripts/perftest/results/transpose-wide-0.1.log
   Total elapsed time:          7.215 sec.
    1  r'             5.376      5
   Total elapsed time:          6.703 sec.
    1  r'             4.871      5
   Total elapsed time:          4.625 sec.
    1  r'             2.815      5
   Total elapsed time:          4.400 sec.
    1  r'             2.592      5
   Total elapsed time:          5.506 sec.
    1  r'             3.721      5
            214645.79 msec task-clock                #   33.943 CPUs utilized   
         ( +- 18.68% )
         658068071617      cycles                    #    3.066 GHz             
         ( +- 18.75% )  (30.71%)
          78768925872      instructions              #    0.12  insn per cycle  
         ( +- 21.76% )  (38.42%)
   scripts/perftest/results/transpose-full-0.1.log
   Total elapsed time:          1.368 sec.
    1  r'             0.583      5
   Total elapsed time:          1.365 sec.
    1  r'             0.574      5
   Total elapsed time:          1.724 sec.
    1  r'             0.835      5
   Total elapsed time:          1.564 sec.
    1  r'             0.708      5
   Total elapsed time:          1.404 sec.
    1  r'             0.522      5
             79268.38 msec task-clock                #   38.130 CPUs utilized   
         ( +-  8.03% )
         239815721367      cycles                    #    3.025 GHz             
         ( +-  7.83% )  (30.85%)
          32295607242      instructions              #    0.13  insn per cycle  
         ( +-  2.61% )  (38.69%)
   scripts/perftest/results/transpose-large.log
   Total elapsed time:          34.586 sec.
    1  r'            31.577      5
   Total elapsed time:          31.789 sec.
    1  r'            28.383      5
   Total elapsed time:          31.772 sec.
    1  r'            28.304      5
   Total elapsed time:          31.899 sec.
    1  r'            28.529      5
   Total elapsed time:          32.218 sec.
    1  r'            28.521      5
            220380.73 msec task-clock                #    6.530 CPUs utilized   
         ( +-  4.50% )
         702976272821      cycles                    #    3.190 GHz             
         ( +-  4.32% )  (30.77%)
         341876674221      instructions              #    0.49  insn per cycle  
         ( +-  1.53% )  (38.51%)
   ```
   Alpha Before:
   
   ``` code
   scripts/perftest/results/transpose-skinny-1.0.log
   Total elapsed time:          4.930 sec.
    1  r'             2.404      5
   Total elapsed time:          5.457 sec.
    2  r'             2.394      5
   Total elapsed time:          5.097 sec.
    1  r'             2.435      5
   Total elapsed time:          5.163 sec.
    1  r'             2.422      5
   Total elapsed time:          4.820 sec.
    1  r'             2.399      5
            168393.69 msec task-clock                #   28.362 CPUs utilized   
         ( +-  5.87% )
         509712089539      cycles                    #    3.027 GHz             
         ( +-  5.78% )  (30.68%)
          64186798883      instructions              #    0.13  insn per cycle  
         ( +-  2.02% )  (38.44%)
   scripts/perftest/results/transpose-wide-1.0.log
   Total elapsed time:          5.288 sec.
    1  r'             2.408      5
   Total elapsed time:          4.944 sec.
    1  r'             2.386      5
   Total elapsed time:          5.192 sec.
    1  r'             2.440      5
   Total elapsed time:          4.996 sec.
    1  r'             2.410      5
   Total elapsed time:          5.010 sec.
    1  r'             2.450      5
            179656.42 msec task-clock                #   30.310 CPUs utilized   
         ( +-  4.60% )
         543794617678      cycles                    #    3.027 GHz             
         ( +-  4.54% )  (30.82%)
          68647994631      instructions              #    0.13  insn per cycle  
         ( +-  1.66% )  (38.59%)
   scripts/perftest/results/transpose-full-1.0.log
   Total elapsed time:          4.217 sec.
    2  r'             1.321      5
   Total elapsed time:          3.806 sec.
    2  r'             1.304      5
   Total elapsed time:          3.456 sec.
    2  r'             0.864      5
   Total elapsed time:          4.261 sec.
    2  r'             1.303      5
   Total elapsed time:          3.254 sec.
    2  r'             0.853      5
            117925.13 msec task-clock                #   25.265 CPUs utilized   
         ( +-  7.35% )
         358782539233      cycles                    #    3.042 GHz             
         ( +-  7.25% )  (30.63%)
          59148304387      instructions              #    0.16  insn per cycle  
         ( +-  1.26% )  (38.40%)
   scripts/perftest/results/transpose-skinny-0.1.log
   Total elapsed time:          3.027 sec.
    1  r'             1.638      5
   Total elapsed time:          3.016 sec.
    1  r'             1.583      5
   Total elapsed time:          2.768 sec.
    1  r'             1.461      5
   Total elapsed time:          3.227 sec.
    1  r'             1.709      5
   Total elapsed time:          2.434 sec.
    1  r'             1.421      5
            103834.93 msec task-clock                #   29.698 CPUs utilized   
         ( +-  6.79% )
         324467854857      cycles                    #    3.125 GHz             
         ( +-  6.93% )  (30.72%)
          47190326093      instructions              #    0.15  insn per cycle  
         ( +-  1.21% )  (38.48%)
   scripts/perftest/results/transpose-wide-0.1.log
   Total elapsed time:          4.556 sec.
    1  r'             2.705      5
   Total elapsed time:          4.808 sec.
    1  r'             3.000      5
   Total elapsed time:          4.250 sec.
    1  r'             2.398      5
   Total elapsed time:          7.544 sec.
    1  r'             5.691      5
   Total elapsed time:          5.221 sec.
    1  r'             3.368      5
            179756.00 msec task-clock                #   30.373 CPUs utilized   
         ( +- 24.85% )
         548210698316      cycles                    #    3.050 GHz             
         ( +- 24.92% )  (30.83%)
          71613380037      instructions              #    0.13  insn per cycle  
         ( +- 24.12% )  (38.56%)
   scripts/perftest/results/transpose-full-0.1.log
   Total elapsed time:          1.314 sec.
    1  r'             0.533      5
   Total elapsed time:          1.489 sec.
    1  r'             0.629      5
   Total elapsed time:          1.621 sec.
    1  r'             0.823      5
   Total elapsed time:          1.269 sec.
    1  r'             0.518      5
   Total elapsed time:          1.346 sec.
    1  r'             0.532      5
             69956.45 msec task-clock                #   35.093 CPUs utilized   
         ( +-  8.36% )
         212798679757      cycles                    #    3.042 GHz             
         ( +-  8.18% )  (30.91%)
          33221409066      instructions              #    0.16  insn per cycle  
         ( +-  3.80% )  (38.69%)
   scripts/perftest/results/transpose-large.log
   Total elapsed time:          34.882 sec.
    1  r'            32.116      5
   Total elapsed time:          31.270 sec.
    1  r'            28.360      5
   Total elapsed time:          32.466 sec.
    1  r'            28.763      5
   Total elapsed time:          33.783 sec.
    1  r'            30.827      5
   Total elapsed time:          34.388 sec.
    1  r'            31.564      5
            226007.94 msec task-clock                #    6.518 CPUs utilized   
         ( +-  4.21% )
         719315156077      cycles                    #    3.183 GHz             
         ( +-  4.15% )  (30.82%)
         350177806352      instructions              #    0.49  insn per cycle  
         ( +-  1.34% )  (38.56%)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to