GPT-2/ BERT/ Elmo are amazing. First I want to ask, has anyone here studied how they work and [sorta] know how to replicate it in a data compressor? Matt? Anyone!? How does it work... You should know if you are on top your game...
My refined analysis is probably very close and is what they say so I'm not really guessing: GPT-2 is utilizing word/word-part (decided by Byte Pair Encoding) positions and their "word2vec" embeds (but may be contextual embeds like Elmo ex. stick this / stick in the - 'stick' is related to twig/move depending on context). With these two things, it takes the last hundred words or so and is activating certain stored "match phrases" of which it can then GRAB the final word from at the end "the cat [ran]" to use for the story "this dog _ran_". The matching just works with similar words and word rearrangement! Like data compressors there is some normalization etc etc to make it work nice. The best lossless text compressors (not quality tho, GPT-2 rules) are bit predictors so, this makes sense it Grabs the final item to predict. GPT-2 also seems to blunt out frequent words using Values so they has less weight, and the words are being compared to each other for similarity and is clarifying/sorting (data compressors sort) the last 100 words so they can match better and is possibly voting on the predictions using the context words though this may be the same thing as predicting the next bit and no candidate choosy-choosing. To predict even better, needs commonsense reasoning. Here's what that means too: "the cure to cancer is ", you need to peer into the context and see what words are related to cancer ex. rashes barfing pain etc, and take for example "the elephant was sick and could not enter the building, ", to predict better you need to look before the context and after it and around its inner parts ex. it was sick or too big, and you need other matches too like "if they run too fast then they will have sore legs" and can answer maybe it was running to a building. >> So, instead of just Grabbing what occurs next most likely, here we are >> abstractly tearing it apart / together grabbing other predictions and >> translation_matches and are actually just making *preparation discoveries >> (background check)* before make a silly poor guess if your a newbie on this >> topic. Ah. It's like seeing "the latter broke in half" and you predict "the >> ladder broke in half because the man stepped on it too hard" but before can >> do that you don't know humans use or step on ladders, so you *should* >> discover first and upsize your artificial data on this question (like on the >> fly advanced_word2vec, and on the fly word2vec too) that ladder = scaffold >> and then see the sentence "we bought a scaffold to use when we build a >> house, we needed something to get to the top of the house" and then see "to >> get to the top of the house we walked up the scaffold". ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T0607d3f3f3678b2f-M97f40a12f80e803d62f63f97 Delivery options: https://agi.topicbox.com/groups/agi/subscription
