Baunsgaard commented on PR #1941:
URL: https://github.com/apache/systemds/pull/1941#issuecomment-1796292189

   > About the performance: My machine showed performance issues when testing 
against PyTorch for very very big inputs.
   > 
   > Stress test: SystemDS: 340 seconds - PyTorch: 32 seconds
   > 
   > The stress test consisted of about 300 forward passes with about 10.000 x 
10.000 matrices. This is likely a problem with my setup and not my 
implementation since the affine layer with the same inputs took 220 seconds. 
The GCL consists of a simple affine part and a convolutional part with the 
convolutional part being a lot more complex. So, the implementation is likely 
quite fast because the complex convolution part makes up less than a third of 
the runtime.
   
   can you show the '-stats' output of calling it, to indicate where we are 
using time.
   and maybe we want to profile it using a profiler: 
https://github.com/async-profiler/async-profiler
   
   If you are in doubt how to use it, i can show you in office.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to