I'm really excited if even just one of yous can advance the AGI design I'm at. I've seen a lot of ANN variants like variants of GANs, LSTMs, Autoencoders, etc etc, they seem to have things like residual connections, layer norm, convolution windows, many feedforward networks stacked, etc, while my design just sticks to a single collection of hierarchies. Of course you can get the same result by similar ways or by breaking it down into multiple tools with math tricks to get same result. But I'm looking for a more explainable architecture that unifies everything into the same general network, and can worry about the math tricks later. That's why I say in my work that to predict the next word, we ex. look at the last context (like GPT-2 does) and activate multiple similar phrase nodes in the hierarchy and see what entails them all, they are all little judges/hierarchies. I don't hear many people saying this, just RNN this, GAN that, no actual straightforward theory.
Transformers have been proven tangibly better than LSTMs in all areas (check out OpenAI and BERT etc), and the Attention Is All You Need papers says, well, it in the tittle. and was written by Google researchers. Transformers are parallel and can process much more faster than RNNs. And you don't need the recurrentness or LSTM schema, which is confusing. I've read many articles on Transformers, they have a long process and many things used, and after reading them all there is no explanation how it actually works, anywhere, I'm the only one no Earth saying how GPT-2 works. There is some explanation if you look at Word2Vec or the Hutter Prize algorithms like PPM, but no one "knows" how GPT-2 works. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf06e133ecd7df7c9-Me5ac574816e45339eef22453 Delivery options: https://agi.topicbox.com/groups/agi/subscription
