[agi] Why do Transformers have layers of Attention Heads?

D D Tue, 15 Dec 2020 22:52:59 -0800

If I understood him right, Yannic Kilcher said in one of his youtube videos
that maybe the middle layers, not the final layers, learn the highest level
features. The last layer must output actual words, not abstractions.
Interesting..


Also see this paper:
BERT Rediscovers the Classical NLP Pipeline

They look at what the learned attention heads are looking at, in the
different layers. How many layers you need - still an open question?

On Saturday, August 1, 2020, stefan.reich.maker.of.eye via AGI <
[email protected]> wrote:

> Not enough attention heads activated here...
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-Meb027e15ed9d959fd04c4d53>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-M1302fbda8f2afe07514bfec6
Delivery options: https://agi.topicbox.com/groups/agi/subscription

[agi] Why do Transformers have layers of Attention Heads?

Reply via email to