Re: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

Mike Tintner Thu, 04 Apr 2013 16:03:42 -0700

You're assuming that vision is mainly about object recognition. That'sstill mindblowingly difficult, but relatively easy compared to the main taskof vision.

The main task is not to recognize the objects in a scene - it is torecognize how those objects *connect*, including "object mechanics." Who orwhat is doing what to who or what? Did this guy fall because that guypunched him or because he stumbled? Is the chair supporting him, or is hesquatting just over it? Is the book lying on the box, or stuck to it? Arethe flowers bending because of a wind or what? Is that a mess, or an orderlyarray of papers?

"Object mechanics" includes how objects keep moving. Where will that movingobject end up? Will he walk straight into the guy ahead? Will that ball hitthe window, or will he have time to catch it first?


And so on and on.

The main task of vision/common sense/consciousness is to understand the*object connectivity* and *mechanics* of the agent's world. For livingagents, the relevant objects and mechanics of the world to be analysed,increase in complexity with the complexity of the agent's body and mind, andtherefore capacity to interact with the world. (So start your AGI at worm orsimpler level not human level).

P.S. One should add that present scientists and technologists are*extremely* ill-equipped to deal with human or animal vision from any AGIperspective. They are totally conditioned to think and look analytically inconsidering any visual scene. They are conditioned by thesentential/propositional form of logic, maths and language. They think interms of CAT ... SAT ... MAT. They don't even realise that thereality being referred to here is not a set of building blocks, but a movieof an object/animal moving continuously through a complex scene. They don'thave the artistic/ synthetic sensibility which looks at scenes as wholes.They will have to acquire it. When humans look at scenes, we look as bothanalytic scientists and synthetic artists.

-----Original Message-----From: Matt Mahoney

Sent: Thursday, April 04, 2013 11:25 PM
To: AGI
Subject: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

On Wed, Apr 3, 2013 at 12:11 PM, Ben Goertzel <[email protected]> wrote:

By using more efficient algorithms than the human brain does ...
How do you know that such algorithms exist? How do you calculate thecomplexity?


What matters is the average case complexity, relative to the
probability distributions characterizing the actual environments and
goals relevant to the AGI system...

There is no good math for calculating this kind of complexity...

So, we are relying in significant part on intuition here....


Turing's intuition was that computers were already fast enough to
solve AI. This was before vacuum tube computers like ENIAC, so I
presume he meant mechanical relays.

Anyway, I would like opinions on the computational complexity of human
vision. Specifically, how would you optimize Google's cat face
recognizer and bring it up to human level?
http://128.84.158.119/abs/1112.6209v3

Their current implementation is a 9 layer neural network with 10^9
connections. It was trained on 10^7 256x256 grayscale images for 3
days on 16,000 CPU cores. It is 15.8% accurate on ImageNet, 70% better
than any other system. Presumably, humans would be able to recognize
most of the images.

Google's system recognizes only still images in isolation. To bring it
to human level, it would have to model motion, color, and stereoscopic
depth perception. It would have a fovea and model saccades, for
example, scanning important visual features such as corners, faces,
words, and moving objects. It would have to be integrated with other
senses to aid recognition. For example, when you turn your head, the
model should predict how the image will change and extract features
from the residual errors. Vision makes heavy use of context. For
example, you can more easily recognize a co-worker at work than at the
store.

By adulthood we see the equivalent of 10^10 images at a frame rate of
around 10 per second. Each frame has 10^8 pixels, although to be fair,
this is reduced to 10^6 low-level features by the retina. A single
processor running at 10^10 OPS could easily do this. It is harder to
estimate the number of higher level features processed by the (much
larger) visual cortex, such as lines, edges, and movement, and then
going up the hierarchy, corners, letters, words, faces, and familiar
objects. The number of top level features would be at least as large
as our vocabulary, about 10^5, although it is probably much higher or
else we could adequately use words to convey pictures.

Google's system is trained on 10^11 bits. The optic nerve transmits
10^16 bits by adulthood, or 10^5 times as much. Coincidentally, our
brain has 10^5 times as many synapses (10^14) as Google's model. We
don't need 10^5 times as many processors because the computation is
spread out over decades, rather than 3 days. I estimate 10^6 cores at
10^9 to 10^10 OPS each.

Is it possible to solve the problem with less hardware? How?

--
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/6952829-59a2eca5

Modify Your Subscription:https://www.listbox.com/member/?&;Powered by Listbox: http://www.listbox.com



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

Reply via email to