Recently I realized what attention heads are. There's 12. But there's 12 layers of them. Hence 144.
Does the next layer "build-on" or "use" what the last A.H. layer made? Or is there 144 individual attention heads? Which is it? And what do the later layers of the 12 do? Do they pay attention to wider longer features or tasks? ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Ta21b3b47e26f50e7-M33e8e2a6e9065bf5903ad42e Delivery options: https://agi.topicbox.com/groups/agi/subscription
