JonahBalshai opened a new pull request, #2287:
URL: https://github.com/apache/systemds/pull/2287

   ## Summary
   
   This PR implements the [**Layer-wise Adaptive Rate Scaling 
(LARS)**](https://arxiv.org/pdf/1708.03888) optimizer in SystemDS. LARS is 
designed to improve convergence for large-batch training by scaling the 
learning rate of each layer based on the norm of its weights and gradients. 
This is especially useful in deep learning applications where standard 
optimizers like SGD struggle to scale efficiently.
   
   ## Changes
   
   - Added `lars.dml` and `lars_util.dml` to `scripts/nn/optim/`
   - Modified all instances of ResNet in  `scripts/nn/networks/` to allow for 
optimization using LARS
   - Added `alexnet.dml`  to  `scripts/nn/networks/`
   - Added local response normalization layer `lrn.dml` to  `scripts/nn/layers/`
   - Extended example scripts with LARS examples in  `scripts/nn/examples/`
   
   ## Performance
   We tested LARS on the MNIST dataset with varying batch sizes against other 
optimizers. In the plots below you can see how our implementation of LARS 
shapes up against other optimizers compared to batch sizes.
   
   PLOTS HERE. With some explanations
   
   ## Challenges and Issues
   - Due to the limited data loading capabilities of systemds and our own 
hardware we often ran into memory issues when loading larger batches. 
Especially with datasets containing larger images like ImageNet.
   
   We're happy to receive any feedback or suggestions to improve our 
implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to