Baunsgaard opened a new pull request #1127: URL: https://github.com/apache/systemds/pull/1127
This PR contains a simple addition to the micro benchmarks. This time transpose of a matrix is measured. 3 basic cases: "skinny" with 2.5mil rows 50 cols "wide" with 50 cols and 2.5 mil rows "full" with 20000 rows and 5000 cols run sparse and dense. on 3 different systems the time it takes for a transpose varies drastically, and currently not as expected: The one case that is very inefficient currently is a sparse wide matrix, that takes 4-5 times longer on many core machines compared to a few cores. Alpha: ```bash scripts/perftest/results/transpose-wide-0.1.log Total elapsed time: 22.405 sec. 1 r' 20.578 5 Total elapsed time: 24.200 sec. 1 r' 22.324 5 Total elapsed time: 27.786 sec. 1 r' 25.900 5 Total elapsed time: 25.258 sec. 1 r' 23.406 5 Total elapsed time: 22.654 sec. 1 r' 20.861 5 1532631.15 msec task-clock # 61.009 CPUs utilized ( +- 4.47% ) 4621538496317 cycles # 3.015 GHz ( +- 4.35% ) (30.76%) 659745166202 instructions # 0.14 insn per cycle ( +- 11.77% ) (38.45%) ``` xps: ```bash scripts/perftest/results/transpose-wide-0.1.log Total elapsed time: 6.269 sec. 1 r' 4.907 5 Total elapsed time: 6.356 sec. 1 r' 5.004 5 Total elapsed time: 6.482 sec. 1 r' 5.108 5 Total elapsed time: 6.511 sec. 1 r' 5.022 5 Total elapsed time: 6.428 sec. 1 r' 5.061 5 60.976,97 msec task-clock # 8,790 CPUs utilized ( +- 0,75% ) 187.499.949.636 cycles # 3,075 GHz ( +- 0,29% ) (30,65%) 92.529.928.550 instructions # 0,49 insn per cycle ( +- 0,36% ) (38,34%) ``` tango: ```bash scripts/perftest/results/transpose-wide-0.1.log Total elapsed time: 23.525 sec. 1 r' 21.169 5 Total elapsed time: 21.440 sec. 1 r' 19.291 5 Total elapsed time: 23.235 sec. 1 r' 21.014 5 Total elapsed time: 23.051 sec. 1 r' 21.037 5 Total elapsed time: 22.883 sec. 1 r' 20.551 5 454278.39 msec task-clock # 19.145 CPUs utilized ( +- 1.71% ) 1203991163914 cycles # 2.650 GHz ( +- 1.64% ) (33.34%) 236261779038 instructions # 0.20 insn per cycle ``` For refference why i'm addressing this is because compression is doing a transpose in the beginning. On my airlines dataset, this transpose takes 16 seconds on alpha while, on my laptop 1 second. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org