MaximilianSchreff opened a new pull request, #2172: URL: https://github.com/apache/systemds/pull/2172
This PR introduces multi-head attention layers as a built in layer with forward and backward pass. ### Description The multi-head attention layer is the base layer of all most Transformer models, with many variations for different models. This implementation is in-line with the basic BERT attention layer. The functionality is currently kept to a minimum without features like attention masking, head masking, cross-attention, etc. ### Testing - New testing module was implemented specifically for this layer, extending automated testing base - Tests execute forward/backward pass with given inputs and compares outputs against expected outputs - Implementation is compared against HuggingFace Transformer library implementation ### Notes This PR is the first in a number of PRs in an effort to support the BERT model in SystemDS and other transformer models in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org