AI is about, and needs, patterns. -- Otherwise you use Brute Force whenever you 
want to invent a solution ex. cars, TVs, pills, etc. So, we NEED patterns. 
Prior experiences.

Text and image data both have patterns. You can for example let the machine 
discover (using text) and inform you "i love food, i have a tongue, i smell, 
the hamster may steal your food, because hamsters also love food, have tongues, 
and smell". Text can tell you what molecule structures are similar to each 
other (semantics) or what they evolve into "usually" ex. dog>bark more often 
than dog>sleep (syntactics).

So, what kind of patterns can images contain that text can't? What? That cats 
are similar to dogs? Or that cats lick paws? Or that cats have ticks on the 
upper left portion of its abdomen? Text can do all that. Maybe vision is the 
same but only gives you better accuracy? Keep reading.

Matt will know the following more than you other guys I bet. You can see in the 
Hutter Prize contest many algorithms - and their histories, you add or improve 
mechanisms (or, data) and wala, the prediction becomes better. For the Hutter 
Prize, the dataset size is fixed, because we know adding data is 1 way to 
improve prediction accuracy, the goal is to make the contestants find better 
the patterns in the fixed dataset. Some dude may add a new mechanism. Then next 
year, improve it, or add another New mechanism. But, they all, even just adding 
more dataset, just improve prediction accuracy more. My point is, be it text or 
image, you can add more data, or add new and never before seen pattern finding 
mechanisms, or improve the ones you found, and all you get is extra accuracy in 
prediction. We already *saw that* in text. And images. Both, are grounded. I'm 
not saying images doesn't have more data. It could. Apparently much of "image 
data" is useless or noise. Not sure.

Now, you may be thinking, we know both text and images have a prediction 
accuracy thing going on, which can tell you new discovers and help you. - But 
does the prediction accuracy in text "transfer" to images? It must. As said, 
text can inform you bees may be dangerous, even though you only know ants are! 
Image data would tell you just that too. And this means also you could use the 
same "algorithm" for text on images too. Ok, but what if (assuming we found all 
text pattern finding mechanisms, improved them max, and fed all our text data) 
text is missing key areas of images we ignored/ got wrong, ex. fine details of 
objects? Call that "Lack". Or what if vision has much more data/patterns or cus 
of its video structure and hence prediction accuracy rises faster? Call that 
"structure". Well, for issue1 we can feed the algorithm images instead of text, 
but for issue2, it may requires some pattern finding mechanism adjustment to 
cope with video structure and the image structure and persistence structure 
(video does flash dog, ate, food, it shows dog at all times in many cases).

So in conclusion, text AGI can be fed image data and get all its benefits BUT 
it is possible (well, maybe) that our now "video AGI text-algorithm" versus 
video AGI could be somewhat less accurate than video and may even require a 
slightly different set of pattern finding mechanisms needed. So the only 
question then is not if text=vision, many of the same predictions exists in 
both and for the same reasons, our question is how does video structure pattern 
finding work or does it exist (if it's more powerful at finding patterns)?

One way to think about that question is (not just how the structure works, but 
why it's more powerful, or if it exists): how can video hold more 
"patterns/data"? If you have a video of walking through a home with cats etc, 
or explaining the cure to cancer (yes, visionGPT-2) what do we have here? The 
cat persists in each frame, well, at least a few, and is accompanied by objects 
in the same frame space. A context (a frame), that is part of a context 
(video)? So we have a sequence cat>ran>slowed>sat>slept and at each frame there 
is accompanied sequences, like music. As we watch a video, our eye is actually 
paying attention to mostly a certain area (be it a word on your screen or the 
whole page of writing (you cant read like that btw, you must read word by word, 
but seeing the book's page is a "word" yes; page)). We still seeing the other 
sequences though a bit some more too as said. And it's not just extra context 
or words, it's connected right in time. Also, when we see the video, we may see 
3D or reflections on cat, this can let us know cat is shiny or near probably a 
flash light, from just a single frame too, hmm, I think seeing "shiny" or 
"flashlight" or "behind an object" is another "word" or view of something 
related.

So conclusion? I think they are very similar. OpenAI's IGPT knew to put 
reflections under the birds. https://openai.com/blog/image-gpt/

---------------------------
---------------------------

another thing i just wrote:

the thing about navigating or chair manufactoring or golfing is these are 
narrow things to become good at, text or vision is general and can decsribe any 
story / anythnig.
/ we model the universe, and have the goal survival
we use all knowledge...however we specialize in a domain such as stem cells or 
AGI to collect data from those "experiments"
only rarely we explore, ex. get bored and read the news during exam lol
ultimately, text and vision is general and can describe anything, and all 
humans ultimately/culmatively focus on some specific domain (on average) in our 
physics and in ex. computing industry (DNA, brains, and our devices, computing/ 
mutating data is a "thing")
so in some sense, all humans talk or work for the computing industry, be it our 
genes or the new "microsoft"
our "survival" uses all info, but some more than other....
it is computing data, and it cares about...computing data :slight_smile:
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0c5c4825f9d718f4-M9daf91c4f2be13e6bae151e8
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to