Matthias Boehm created SYSTEMML-2398: ----------------------------------------
Summary: Paramserv ASP aggregation overhead in on update per epoch Key: SYSTEMML-2398 URL: https://issues.apache.org/jira/browse/SYSTEMML-2398 Project: SystemML Issue Type: Bug Reporter: Matthias Boehm Here are the statistics of mnist60K, 2 epochs, 80 workers in ASP {code} SystemML Statistics: Total elapsed time: 449.548 sec. Total compilation time: 1.995 sec. Total execution time: 447.553 sec. Number of compiled MR Jobs: 0. Number of executed MR Jobs: 0. Cache hits (Mem, WB, FS, HDFS): 970241/0/0/2. Cache writes (WB, FS, HDFS): 55191/0/0. Cache times (ACQr/m, RLS, EXP): 1.048/0.120/1.087/0.000 sec. HOP DAGs recompiled (PRED, SB): 0/13582. HOP DAGs recompile time: 24.473 sec. Functions recompiled: 1. Functions recompile time: 0.013 sec. Paramserv func number of workers: 79. Paramserv func total gradients compute time: 1714.962 secs. Paramserv func total aggregation time: 428.499 secs. Paramserv func model broadcasting time: 2.080 secs. Paramserv func total batch slicing time: 0.0190000000 secs. Total JIT compile time: 37.461 sec. Total JVM GC count: 66. Total JVM GC time: 7.098 sec. Heavy hitter instructions: # Instruction Time(s) Count 1 conv2d_bias_add 719.111 13768 2 paramserv 437.051 1 3 relu_backward 210.414 20370 4 ba+* 180.001 40928 5 conv2d_backward_filter 175.104 13580 6 +* 156.714 81480 7 conv2d_backward_data 140.779 6790 8 * 123.502 95173 9 -* 104.058 54320 10 - 94.502 74985 {code} As we see the aggregation is a major bottleneck. This is unexpected due to the coarse-grained update per epoch. [~Guobao] could you please have a look and profile where this is coming from? -- This message was sent by Atlassian JIRA (v7.6.3#76005)