GPT-2/ BERT/ Elmo are amazing.

First I want to ask, has anyone here studied how they work and [sorta] know how 
to replicate it in a data compressor? Matt? Anyone!? How does it work... You 
should know if you are on top your game...

My refined analysis is probably very close and is what they say so I'm not 
really guessing: GPT-2 is utilizing word/word-part (decided by Byte Pair 
Encoding) positions and their "word2vec" embeds (but may be contextual embeds 
like Elmo ex. stick this / stick in the - 'stick' is related to twig/move 
depending on context). With these two things, it takes the last hundred words 
or so and is activating certain stored "match phrases" of which it can then 
GRAB the final word from at the end "the cat [ran]" to use for the story "this 
dog _ran_". The matching just works with similar words and word rearrangement! 
Like data compressors there is some normalization etc etc to make it work nice. 
The best lossless text compressors (not quality tho, GPT-2 rules) are bit 
predictors so, this makes sense it Grabs the final item to predict. GPT-2 also 
seems to blunt out frequent words using Values so they has less weight, and the 
words are being compared to each other for similarity and is clarifying/sorting 
(data compressors sort) the last 100 words so they can match better and is 
possibly voting on the predictions using the context words though this may be 
the same thing as predicting the next bit and no candidate choosy-choosing.

To predict even better, needs commonsense reasoning. Here's what that means 
too: "the cure to cancer is ", you need to peer into the context and see what 
words are related to cancer ex. rashes barfing pain etc, and take for example 
"the elephant was sick and could not enter the building, ", to predict better 
you need to look before the context and after it and around its inner parts ex. 
it was sick or too big, and you need other matches too like "if they run too 
fast then they will have sore legs" and can answer maybe it was running to a 
building.
>> So, instead of just Grabbing what occurs next most likely, here we are 
>> abstractly tearing it apart / together grabbing other predictions and 
>> translation_matches and are actually just making *preparation discoveries 
>> (background check)* before make a silly poor guess if your a newbie on this 
>> topic. Ah. It's like seeing "the latter broke in half" and you predict "the 
>> ladder broke in half because the man stepped on it too hard" but before can 
>> do that you don't know humans use or step on ladders, so you *should* 
>> discover first and upsize your artificial data on this question (like on the 
>> fly advanced_word2vec, and on the fly word2vec too) that ladder = scaffold 
>> and then see the sentence "we bought a scaffold to use when we build a 
>> house, we needed something to get to the top of the house" and then see "to 
>> get to the top of the house we walked up the scaffold".
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0607d3f3f3678b2f-M97f40a12f80e803d62f63f97
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to