If you do what GPT-2 does, it can give you plans, but the human must act them 
out. The text is just data, if you add images, it's just more data involved in 
Self-Attention. That'd result in a visual|text GPT-3 that has simply more 
context.

Did you see MuseNet? It predicts the next note for each instrument, using all 
notes from each instrument.
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T4f01e8a4b34d0e2a-Mfb63426fd295d9353b032646
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to