JonahBalshai opened a new pull request, #2287: URL: https://github.com/apache/systemds/pull/2287
## Summary This PR implements the [**Layer-wise Adaptive Rate Scaling (LARS)**](https://arxiv.org/pdf/1708.03888) optimizer in SystemDS. LARS is designed to improve convergence for large-batch training by scaling the learning rate of each layer based on the norm of its weights and gradients. This is especially useful in deep learning applications where standard optimizers like SGD struggle to scale efficiently. ## Changes - Added `lars.dml` and `lars_util.dml` to `scripts/nn/optim/` - Modified all instances of ResNet in `scripts/nn/networks/` to allow for optimization using LARS - Added `alexnet.dml` to `scripts/nn/networks/` - Added local response normalization layer `lrn.dml` to `scripts/nn/layers/` - Extended example scripts with LARS examples in `scripts/nn/examples/` ## Performance We tested LARS on the MNIST dataset with varying batch sizes against other optimizers. In the plots below you can see how our implementation of LARS shapes up against other optimizers compared to batch sizes. PLOTS HERE. With some explanations ## Challenges and Issues - Due to the limited data loading capabilities of systemds and our own hardware we often ran into memory issues when loading larger batches. Especially with datasets containing larger images like ImageNet. We're happy to receive any feedback or suggestions to improve our implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org