Baunsgaard opened a new pull request #1127:
URL: https://github.com/apache/systemds/pull/1127


   This PR contains a simple addition to the micro benchmarks.
   This time transpose of a matrix is measured.
   
   3 basic cases:
   
   "skinny" with 2.5mil rows 50 cols
   "wide" with 50 cols and 2.5 mil rows
   "full" with 20000 rows and 5000 cols
   
   run sparse and dense.
   
   on 3 different systems the time it takes for a transpose varies drastically, 
and currently not as expected:
   
   The one case that is very inefficient currently is a sparse wide matrix, 
that takes 4-5 times longer on many core machines compared to a few cores.
   
   Alpha: 
   ```bash
   scripts/perftest/results/transpose-wide-0.1.log
   Total elapsed time:          22.405 sec.
    1  r'            20.578      5
   Total elapsed time:          24.200 sec.
    1  r'            22.324      5
   Total elapsed time:          27.786 sec.
    1  r'            25.900      5
   Total elapsed time:          25.258 sec.
    1  r'            23.406      5
   Total elapsed time:          22.654 sec.
    1  r'            20.861      5
           1532631.15 msec task-clock                #   61.009 CPUs utilized   
         ( +-  4.47% )
        4621538496317      cycles                    #    3.015 GHz             
         ( +-  4.35% )  (30.76%)
         659745166202      instructions              #    0.14  insn per cycle  
         ( +- 11.77% )  (38.45%)
   ```
   
   xps: 
   ```bash
   scripts/perftest/results/transpose-wide-0.1.log
   Total elapsed time:             6.269 sec.
    1  r'             4.907      5
   Total elapsed time:             6.356 sec.
    1  r'             5.004      5
   Total elapsed time:             6.482 sec.
    1  r'             5.108      5
   Total elapsed time:             6.511 sec.
    1  r'             5.022      5
   Total elapsed time:             6.428 sec.
    1  r'             5.061      5
            60.976,97 msec task-clock                #    8,790 CPUs utilized   
         ( +-  0,75% )
      187.499.949.636      cycles                    #    3,075 GHz             
         ( +-  0,29% )  (30,65%)
       92.529.928.550      instructions              #    0,49  insn per cycle  
         ( +-  0,36% )  (38,34%)
   ```
   
   tango:
   ```bash
   scripts/perftest/results/transpose-wide-0.1.log
   Total elapsed time:          23.525 sec.
    1  r'            21.169      5
   Total elapsed time:          21.440 sec.
    1  r'            19.291      5
   Total elapsed time:          23.235 sec.
    1  r'            21.014      5
   Total elapsed time:          23.051 sec.
    1  r'            21.037      5
   Total elapsed time:          22.883 sec.
    1  r'            20.551      5
            454278.39 msec task-clock                #   19.145 CPUs utilized   
         ( +-  1.71% )
        1203991163914      cycles                    #    2.650 GHz             
         ( +-  1.64% )  (33.34%)
         236261779038      instructions              #    0.20  insn per cycle  
    
   ```
   
   For refference why i'm addressing this is because compression is doing a 
transpose in the beginning.
   On my airlines dataset, this transpose takes 16 seconds on alpha while, on 
my laptop 1 second.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to