Baunsgaard commented on pull request #1127: URL: https://github.com/apache/systemds/pull/1127#issuecomment-748005848
When looking at before and after (the way i tested it was dropping the transpose commit from the history.) it looks like i might have done something wrong in the initial tests. That said, it does not look like the changes had any impact, but it did make me notice the difference between executions on the wide transpose is large. Sometimes it takes 5 seconds sometimes 2.5 I'm guessing it has to do with the two NUMA nodes? The Full transpose micro benchmark: After change Alpha ```code scripts/perftest/results/transpose-skinny-1.0.log Total elapsed time: 5.177 sec. 1 r' 2.567 5 Total elapsed time: 5.592 sec. 1 r' 2.487 5 Total elapsed time: 5.394 sec. 2 r' 2.393 5 Total elapsed time: 5.607 sec. 1 r' 2.496 5 Total elapsed time: 5.361 sec. 1 r' 2.531 5 195735.81 msec task-clock # 31.188 CPUs utilized ( +- 3.50% ) 595845281584 cycles # 3.044 GHz ( +- 3.34% ) (30.75%) 67405027834 instructions # 0.11 insn per cycle ( +- 2.26% ) (38.51%) scripts/perftest/results/transpose-wide-1.0.log Total elapsed time: 4.870 sec. 1 r' 2.439 5 Total elapsed time: 5.466 sec. 1 r' 2.418 5 Total elapsed time: 5.381 sec. 1 r' 2.393 5 Total elapsed time: 5.257 sec. 1 r' 2.343 5 Total elapsed time: 4.880 sec. 1 r' 2.453 5 197370.59 msec task-clock # 32.701 CPUs utilized ( +- 6.74% ) 598434626116 cycles # 3.032 GHz ( +- 6.70% ) (30.76%) 70128163005 instructions # 0.12 insn per cycle ( +- 1.65% ) (38.51%) scripts/perftest/results/transpose-full-1.0.log Total elapsed time: 3.736 sec. 2 r' 1.343 5 Total elapsed time: 3.858 sec. 2 r' 1.326 5 Total elapsed time: 3.500 sec. 2 r' 1.299 5 Total elapsed time: 3.894 sec. 2 r' 1.305 5 Total elapsed time: 3.526 sec. 2 r' 1.304 5 104490.76 msec task-clock # 22.819 CPUs utilized ( +- 1.56% ) 320478636150 cycles # 3.067 GHz ( +- 1.69% ) (30.80%) 62146562879 instructions # 0.19 insn per cycle ( +- 1.59% ) (38.55%) scripts/perftest/results/transpose-skinny-0.1.log Total elapsed time: 2.701 sec. 1 r' 1.437 5 Total elapsed time: 2.659 sec. 1 r' 1.141 5 Total elapsed time: 3.174 sec. 1 r' 1.761 5 Total elapsed time: 2.705 sec. 1 r' 1.103 5 Total elapsed time: 3.112 sec. 1 r' 1.472 5 152922.25 msec task-clock # 43.917 CPUs utilized ( +- 5.32% ) 473697710114 cycles # 3.098 GHz ( +- 5.32% ) (31.11%) 75871932728 instructions # 0.16 insn per cycle ( +- 2.13% ) (38.92%) scripts/perftest/results/transpose-wide-0.1.log Total elapsed time: 7.215 sec. 1 r' 5.376 5 Total elapsed time: 6.703 sec. 1 r' 4.871 5 Total elapsed time: 4.625 sec. 1 r' 2.815 5 Total elapsed time: 4.400 sec. 1 r' 2.592 5 Total elapsed time: 5.506 sec. 1 r' 3.721 5 214645.79 msec task-clock # 33.943 CPUs utilized ( +- 18.68% ) 658068071617 cycles # 3.066 GHz ( +- 18.75% ) (30.71%) 78768925872 instructions # 0.12 insn per cycle ( +- 21.76% ) (38.42%) scripts/perftest/results/transpose-full-0.1.log Total elapsed time: 1.368 sec. 1 r' 0.583 5 Total elapsed time: 1.365 sec. 1 r' 0.574 5 Total elapsed time: 1.724 sec. 1 r' 0.835 5 Total elapsed time: 1.564 sec. 1 r' 0.708 5 Total elapsed time: 1.404 sec. 1 r' 0.522 5 79268.38 msec task-clock # 38.130 CPUs utilized ( +- 8.03% ) 239815721367 cycles # 3.025 GHz ( +- 7.83% ) (30.85%) 32295607242 instructions # 0.13 insn per cycle ( +- 2.61% ) (38.69%) scripts/perftest/results/transpose-large.log Total elapsed time: 34.586 sec. 1 r' 31.577 5 Total elapsed time: 31.789 sec. 1 r' 28.383 5 Total elapsed time: 31.772 sec. 1 r' 28.304 5 Total elapsed time: 31.899 sec. 1 r' 28.529 5 Total elapsed time: 32.218 sec. 1 r' 28.521 5 220380.73 msec task-clock # 6.530 CPUs utilized ( +- 4.50% ) 702976272821 cycles # 3.190 GHz ( +- 4.32% ) (30.77%) 341876674221 instructions # 0.49 insn per cycle ( +- 1.53% ) (38.51%) ``` Alpha Before: ``` code scripts/perftest/results/transpose-skinny-1.0.log Total elapsed time: 4.930 sec. 1 r' 2.404 5 Total elapsed time: 5.457 sec. 2 r' 2.394 5 Total elapsed time: 5.097 sec. 1 r' 2.435 5 Total elapsed time: 5.163 sec. 1 r' 2.422 5 Total elapsed time: 4.820 sec. 1 r' 2.399 5 168393.69 msec task-clock # 28.362 CPUs utilized ( +- 5.87% ) 509712089539 cycles # 3.027 GHz ( +- 5.78% ) (30.68%) 64186798883 instructions # 0.13 insn per cycle ( +- 2.02% ) (38.44%) scripts/perftest/results/transpose-wide-1.0.log Total elapsed time: 5.288 sec. 1 r' 2.408 5 Total elapsed time: 4.944 sec. 1 r' 2.386 5 Total elapsed time: 5.192 sec. 1 r' 2.440 5 Total elapsed time: 4.996 sec. 1 r' 2.410 5 Total elapsed time: 5.010 sec. 1 r' 2.450 5 179656.42 msec task-clock # 30.310 CPUs utilized ( +- 4.60% ) 543794617678 cycles # 3.027 GHz ( +- 4.54% ) (30.82%) 68647994631 instructions # 0.13 insn per cycle ( +- 1.66% ) (38.59%) scripts/perftest/results/transpose-full-1.0.log Total elapsed time: 4.217 sec. 2 r' 1.321 5 Total elapsed time: 3.806 sec. 2 r' 1.304 5 Total elapsed time: 3.456 sec. 2 r' 0.864 5 Total elapsed time: 4.261 sec. 2 r' 1.303 5 Total elapsed time: 3.254 sec. 2 r' 0.853 5 117925.13 msec task-clock # 25.265 CPUs utilized ( +- 7.35% ) 358782539233 cycles # 3.042 GHz ( +- 7.25% ) (30.63%) 59148304387 instructions # 0.16 insn per cycle ( +- 1.26% ) (38.40%) scripts/perftest/results/transpose-skinny-0.1.log Total elapsed time: 3.027 sec. 1 r' 1.638 5 Total elapsed time: 3.016 sec. 1 r' 1.583 5 Total elapsed time: 2.768 sec. 1 r' 1.461 5 Total elapsed time: 3.227 sec. 1 r' 1.709 5 Total elapsed time: 2.434 sec. 1 r' 1.421 5 103834.93 msec task-clock # 29.698 CPUs utilized ( +- 6.79% ) 324467854857 cycles # 3.125 GHz ( +- 6.93% ) (30.72%) 47190326093 instructions # 0.15 insn per cycle ( +- 1.21% ) (38.48%) scripts/perftest/results/transpose-wide-0.1.log Total elapsed time: 4.556 sec. 1 r' 2.705 5 Total elapsed time: 4.808 sec. 1 r' 3.000 5 Total elapsed time: 4.250 sec. 1 r' 2.398 5 Total elapsed time: 7.544 sec. 1 r' 5.691 5 Total elapsed time: 5.221 sec. 1 r' 3.368 5 179756.00 msec task-clock # 30.373 CPUs utilized ( +- 24.85% ) 548210698316 cycles # 3.050 GHz ( +- 24.92% ) (30.83%) 71613380037 instructions # 0.13 insn per cycle ( +- 24.12% ) (38.56%) scripts/perftest/results/transpose-full-0.1.log Total elapsed time: 1.314 sec. 1 r' 0.533 5 Total elapsed time: 1.489 sec. 1 r' 0.629 5 Total elapsed time: 1.621 sec. 1 r' 0.823 5 Total elapsed time: 1.269 sec. 1 r' 0.518 5 Total elapsed time: 1.346 sec. 1 r' 0.532 5 69956.45 msec task-clock # 35.093 CPUs utilized ( +- 8.36% ) 212798679757 cycles # 3.042 GHz ( +- 8.18% ) (30.91%) 33221409066 instructions # 0.16 insn per cycle ( +- 3.80% ) (38.69%) scripts/perftest/results/transpose-large.log Total elapsed time: 34.882 sec. 1 r' 32.116 5 Total elapsed time: 31.270 sec. 1 r' 28.360 5 Total elapsed time: 32.466 sec. 1 r' 28.763 5 Total elapsed time: 33.783 sec. 1 r' 30.827 5 Total elapsed time: 34.388 sec. 1 r' 31.564 5 226007.94 msec task-clock # 6.518 CPUs utilized ( +- 4.21% ) 719315156077 cycles # 3.183 GHz ( +- 4.15% ) (30.82%) 350177806352 instructions # 0.49 insn per cycle ( +- 1.34% ) (38.56%) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org