Recently I realized what attention heads are. There's 12.

But there's 12 layers of them. Hence 144.

Does the next layer "build-on" or "use" what the last A.H. layer made? Or is 
there 144 individual attention heads? Which is it?

And what do the later layers of the 12 do? Do they pay attention to wider 
longer features or tasks?
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-M33e8e2a6e9065bf5903ad42e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to