Forget where I left off but so what I'm doing is trying to dig deeper into 
minGPT and actually still am trying / have to try something yet with my 
original algorithm - hopefully I can make it work better.

And yes, both Lossless Compression and Perplexity are good, I think Perplexity 
may be simply faster but isn't perfect for no cheating, but I think you can't 
cheat much as long as you make sure your code is not big compared to how many 
bytes you are predicting (cuz you could "store in the code all 1M correct 
letters" - and in a compressed manner too ex. 200KB code file). Of course 
instead of checking the output matches like in LC, you need to check that the 
score function in Perplexity really is not ex. adding more points or less or 
ruining the set of predicted letters and the values of probabilities, and that 
your byte or word to predict really is the right one, etc.

-------------------------------------------

So, Perplexity may be better for most the AGI use case for metric measurement, 
if I'm correct. But some PPM versus Transformer, I'm not sure, they seem to 
make Transformers look much harder than what it must be, so I still am trying 
to figure out how it works - that itself is an issue from working on it! Also 
if mine works as good, then that may make working on AGI simpler and maybe even 
train faster. Now, no, I understand most of GPT, but not "why" it needs embeds, 
etc. - I need all these functions answered. Why GELU? Why etc? Does it improve 
the representation learning space FOR backprop? How so? It sounds like Embeds 
are allowing input to use related words so that when it sees dog>?, it can 
remember cat>meowed and predict dog>[meowed].
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T3d1f12d4c108c90c-M266e2f0bcc628f03739e91a2
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to