To clarify the above:
In transformers and graph NNs, context or embeddings in respectively attention
heads and edges represent relative lateral positions of connected items. But
the main question is where they come from. In a fully unsupervised scheme they
must be learned, not hand-coded. If such learning is to be distinct from
generic backprop, the only alternative I see is connectivity clustering.
Otherwise transformers become indistinguishable from MLP.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink:
https://agi.topicbox.com/groups/agi/T8366cc740ec68376-Mf44fda55738a32dc456822bb
Delivery options: https://agi.topicbox.com/groups/agi/subscription