Baunsgaard commented on PR #1941: URL: https://github.com/apache/systemds/pull/1941#issuecomment-1796292189
> About the performance: My machine showed performance issues when testing against PyTorch for very very big inputs. > > Stress test: SystemDS: 340 seconds - PyTorch: 32 seconds > > The stress test consisted of about 300 forward passes with about 10.000 x 10.000 matrices. This is likely a problem with my setup and not my implementation since the affine layer with the same inputs took 220 seconds. The GCL consists of a simple affine part and a convolutional part with the convolutional part being a lot more complex. So, the implementation is likely quite fast because the complex convolution part makes up less than a third of the runtime. can you show the '-stats' output of calling it, to indicate where we are using time. and maybe we want to profile it using a profiler: https://github.com/async-profiler/async-profiler If you are in doubt how to use it, i can show you in office. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org