If I understood him right, Yannic Kilcher said in one of his youtube videos
that maybe the middle layers, not the final layers, learn the highest level
features. The last layer must output actual words, not abstractions.
Interesting..

Also see this paper:
BERT Rediscovers the Classical NLP Pipeline

They look at what the learned attention heads are looking at, in the
different layers. How many layers you need - still an open question?

On Saturday, August 1, 2020, stefan.reich.maker.of.eye via AGI <
[email protected]> wrote:

> Not enough attention heads activated here...
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-Meb027e15ed9d959fd04c4d53>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-M1302fbda8f2afe07514bfec6
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to