You should make BERT 'refine'/'edit' a Paper where it has the whole text 
written by non bi directional and then once it has the full Paper, it then 
refines it recursively by looking at a token target and the context around it. 
That way it can truly utilize the extra context during generating data. Of 
course if you were to generate word by word like GPT-2 ex. A...AB...ABC... I'm 
unsure how you can out-beat GPT-2 if there's no context on the other side... Of 
course if you already have 2 sequences you can do translation better, but then 
that is just doing what GPT-2 does however I think GPT-2 scores words to words 
non bi directional so maybe Next Word prediction could be better. So 3 things 
here of importance.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tcfe7cc93841eec23-M637dda12f94a61c847215451
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to