If I understood him right, Yannic Kilcher said in one of his youtube videos that maybe the middle layers, not the final layers, learn the highest level features. The last layer must output actual words, not abstractions. Interesting..
Also see this paper: BERT Rediscovers the Classical NLP Pipeline They look at what the learned attention heads are looking at, in the different layers. How many layers you need - still an open question? On Saturday, August 1, 2020, stefan.reich.maker.of.eye via AGI < [email protected]> wrote: > Not enough attention heads activated here... > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-Meb027e15ed9d959fd04c4d53> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-M1302fbda8f2afe07514bfec6 Delivery options: https://agi.topicbox.com/groups/agi/subscription
