[PR] Multi-Head Attention Layer Implementation [systemds]

via GitHub Wed, 01 Jan 2025 09:56:35 -0800


MaximilianSchreff opened a new pull request, #2172:
URL: https://github.com/apache/systemds/pull/2172


   This PR introduces multi-head attention layers as a built in layer with 
forward and backward pass.
   
   ### Description
   
   The multi-head attention layer is the base layer of all most Transformer 
models, with many variations for different models. This implementation is 
in-line with the basic BERT attention layer. The functionality is currently 
kept to a minimum without features like attention masking, head masking, 
cross-attention, etc.
   
   ### Testing
   
   - New testing module was implemented specifically for this layer, extending 
automated testing base
   - Tests execute forward/backward pass with given inputs and compares outputs 
against expected outputs
   - Implementation is compared against HuggingFace Transformer library 
implementation
   
   ### Notes
   
   This PR is the first in a number of PRs in an effort to support the BERT 
model in SystemDS and other transformer models in the future. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] Multi-Head Attention Layer Implementation [systemds]

Reply via email to