Baunsgaard commented on PR #1940:
URL: https://github.com/apache/systemds/pull/1940#issuecomment-1796394057

   > Here is the stat output when running a single forward pass of the basic 
residual block:
   > 
   > SystemDS Statistics: Total elapsed time: 14,218 sec. Total compilation 
time: 1,144 sec. Total execution time: 13,074 sec. Cache hits 
(Mem/Li/WB/FS/HDFS): 97/0/0/0/0. Cache writes (Li/WB/FS/HDFS): 3/33/0/0. Cache 
times (ACQr/m, RLS, EXP): 0,000/0,001/0,006/0,000 sec. HOP DAGs recompiled 
(PRED, SB): 0/7. HOP DAGs recompile time: 0,037 sec. Total JIT compile time: 
3.086 sec. Total JVM GC count: 2. Total JVM GC time: 0.023 sec. Heavy hitter 
instructions:
   > # Instruction Time(s) Count
   > 
   > 1 basic_block_forward 12,861 1 2 conv2d_bias_add 6,923 3 3 forward 5,463 3 
4 uacvar 2,532 3 5 bias_multiply 1,174 6 6 bias_add 0,940 6 7 uacmean 0,640 3 8 
max 0,315 2 9 rand 0,208 9 10 + 0,142 16
   > 
   > The input has is of shape 32x64x120x100 (N = 32, C = 64, Hin = 120, Win = 
100). The channel size is expanded to 128 within the residual block which is a 
realisitic residual block. There are 3 sequential conv2d forward passes 
happening which is the baseline in terms of runtime.
   
   This sounds painfully slow, have you compared to a single forward ResNet 
layer pyTorch?
   I know that our conv2d_bias_add is slow, I do not recall if we have a native 
version of this, but if we have that should speed it up.
   if you use the -conf flag when executing and add a sysds.native.blas 
parameter to your configuration maybe we can speed this up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to