I'm really excited if even just one of yous can advance the AGI design I'm at. 
I've seen a lot of ANN variants like variants of GANs, LSTMs, Autoencoders, etc 
etc, they seem to have things like residual connections, layer norm, 
convolution windows, many feedforward networks stacked, etc, while my design 
just sticks to a single collection of hierarchies. Of course you can get the 
same result by similar ways or by breaking it down into multiple tools with 
math tricks to get same result. But I'm looking for a more explainable 
architecture that unifies everything into the same general network, and can 
worry about the math tricks later. That's why I say in my work that to predict 
the next word, we ex. look at the last context (like GPT-2 does) and activate 
multiple similar phrase nodes in the hierarchy and see what entails them all, 
they are all little judges/hierarchies. I don't hear many people saying this, 
just RNN this, GAN that, no actual straightforward theory.

Transformers have been proven tangibly better than LSTMs in all areas (check 
out OpenAI and BERT etc), and the Attention Is All You Need papers says, well, 
it in the tittle. and was written by Google researchers. Transformers are 
parallel and can process much more faster than RNNs. And you don't need the 
recurrentness or LSTM schema, which is confusing.

I've read many articles on Transformers, they have a long process and many 
things used, and after reading them all there is no explanation how it actually 
works, anywhere, I'm the only one no Earth saying how GPT-2 works. There is 
some explanation if you look at Word2Vec or the Hutter Prize algorithms like 
PPM, but no one "knows" how GPT-2 works.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tf06e133ecd7df7c9-Me5ac574816e45339eef22453
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to