Forget where I left off but so what I'm doing is trying to dig deeper into minGPT and actually still am trying / have to try something yet with my original algorithm - hopefully I can make it work better.
And yes, both Lossless Compression and Perplexity are good, I think Perplexity may be simply faster but isn't perfect for no cheating, but I think you can't cheat much as long as you make sure your code is not big compared to how many bytes you are predicting (cuz you could "store in the code all 1M correct letters" - and in a compressed manner too ex. 200KB code file). Of course instead of checking the output matches like in LC, you need to check that the score function in Perplexity really is not ex. adding more points or less or ruining the set of predicted letters and the values of probabilities, and that your byte or word to predict really is the right one, etc. ------------------------------------------- So, Perplexity may be better for most the AGI use case for metric measurement, if I'm correct. But some PPM versus Transformer, I'm not sure, they seem to make Transformers look much harder than what it must be, so I still am trying to figure out how it works - that itself is an issue from working on it! Also if mine works as good, then that may make working on AGI simpler and maybe even train faster. Now, no, I understand most of GPT, but not "why" it needs embeds, etc. - I need all these functions answered. Why GELU? Why etc? Does it improve the representation learning space FOR backprop? How so? It sounds like Embeds are allowing input to use related words so that when it sees dog>?, it can remember cat>meowed and predict dog>[meowed]. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T3d1f12d4c108c90c-M266e2f0bcc628f03739e91a2 Delivery options: https://agi.topicbox.com/groups/agi/subscription
