So if w2v uses ex. 1000 dimensions, then each word has 1000 coordinates, and 
you have say a 2000 word vocab, how do you store the structure otherwise? Can 
you explain? I see a viz here https://hazyresearch.github.io/hyperE/ however I 
don't see how a node can be in 45 different places at the same time...

If GPT-2 predicts the next token based on the left side context, and BERT does 
by bi-directionalism, BERT is twice as better, is there something more why you 
want to use BERT? Or just that reason?
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcfe7cc93841eec23-M0cd0696f7828e9c9e05b1c20
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to