RE: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

Piaget Modeler Thu, 04 Apr 2013 23:13:21 -0700

JPEG 7.0 offers object and motion detection built in.  So some of the work is 
already done for you.
~PM

Date: Thu, 4 Apr 2013 20:42:45 -0400
Subject: Re: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)
From: [email protected]
To: [email protected]

On Thu, Apr 4, 2013 at 7:02 PM, Mike Tintner <[email protected]> wrote:

The main task ...is to recognize how those objects *connect*, including "object 
mechanics." Who or what is doing what to who or what? ... Is the book lying on 
the box, or stuck to it? Are the flowers bending because of a wind or what? Is 
that a mess, or an orderly array of papers?

 "Object mechanics" includes how objects keep moving. Where will that moving 
object end up? Will he walk straight into the guy ahead? Will that ball hit the 
window, or will he have time to catch it first?
...

 The main task of vision/common sense/consciousness is to understand the 
*object connectivity* and *mechanics* of the agent's world. For living agents, 
the relevant objects and mechanics of the world to be analysed, increase in 
complexity with the complexity of the agent's body and mind, and therefore 
capacity to interact with the world. (So start your AGI at worm or simpler 
level not human level).
------------------------ If you start your AGI project with some kind of well 
defined intention agenda in mind it might be easier to get the project going.  
(This is important because the best way (the only way?) to write a complicated 
program is to start with something simple then keep on improving it.)  
 With some narrow agenda you might be able to write an effective AGI program 
that is capable of meeting the agenda. Then, as you add to the agenda, if your 
basic concept was truly general and scalable you might be able to repeatedly 
improve on the program enough to show genuine advancement.  However, if your 
idea is inadequate then you will get stuck pretty quickly.
 Our trouble defining the mechanisms of understanding shows that the agenda 
that we want to design is too broad and ill-defined.  So instead of asking how 
can our program gain understanding we probably should ask how could our program 
deal with this kind of problem or that kind of problem where the mechanisms to 
deal with the problem are general and challenging but not undefinable.  Then 
keep on going and see what happens.  If your basic ideas are good, you will 
keep making progress.
 Jim Bromer 

 On Thu, Apr 4, 2013 at 7:02 PM, Mike Tintner <[email protected]> wrote:

You're assuming that vision is mainly about object recognition.  That's still 
mindblowingly difficult, but relatively easy compared to the main task of 
vision.

The main task is not to recognize the objects in a scene - it is to recognize 
how those objects *connect*, including "object mechanics." Who or what is doing 
what to who or what? Did this guy fall because that guy punched him or because 
he stumbled? Is the chair supporting him, or is he squatting just over it? Is 
the book lying on the box, or stuck to it? Are the flowers bending because of a 
wind or what? Is that a mess, or an orderly array of papers?

"Object mechanics" includes how objects keep moving. Where will that moving 
object end up? Will he walk straight into the guy ahead? Will that ball hit the 
window, or will he have time to catch it first?

And so on and on.

The main task of vision/common sense/consciousness is to understand the *object 
connectivity* and *mechanics* of the agent's world. For living agents, the 
relevant objects and mechanics of the world to be analysed, increase in 
complexity with the complexity of the agent's body and mind, and therefore 
capacity to interact with the world. (So start your AGI at worm or simpler 
level not human level).

P.S.  One should add that present scientists and technologists are *extremely* 
ill-equipped to deal with human or animal vision from any AGI perspective. They 
are totally conditioned to think and look analytically in considering any 
visual scene.  They are conditioned by the sentential/propositional form of 
logic, maths and language. They think in terms of  CAT   ...     SAT  ...     
MAT.   They don't even realise that the reality being referred to  here is not 
a set of building blocks, but a movie of an object/animal moving continuously 
through a complex scene. They don't have the artistic/ synthetic sensibility 
which looks at scenes as wholes. They will have to acquire it. When humans look 
at scenes, we look as both analytic scientists and synthetic artists.

-----Original Message----- From: Matt Mahoney

Sent: Thursday, April 04, 2013 11:25 PM

To: AGI

Subject: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

On Wed, Apr 3, 2013 at 12:11 PM, Ben Goertzel <[email protected]> wrote:

By using more efficient algorithms than the human brain does ...

How do you know that such algorithms exist? How do you calculate the complexity?

What matters is the average case complexity, relative to the

probability distributions characterizing the actual environments and

goals relevant to the AGI system...

There is no good math for calculating this kind of complexity...

So, we are relying in significant part on intuition here....

Turing's intuition was that computers were already fast enough to

solve AI. This was before vacuum tube computers like ENIAC, so I

presume he meant mechanical relays.

Anyway, I would like opinions on the computational complexity of human

vision. Specifically, how would you optimize Google's cat face

recognizer and bring it up to human level?

http://128.84.158.119/abs/1112.6209v3

Their current implementation is a 9 layer neural network with 10^9

connections. It was trained on 10^7 256x256 grayscale images for 3

days on 16,000 CPU cores. It is 15.8% accurate on ImageNet, 70% better

than any other system. Presumably, humans would be able to recognize

most of the images.

Google's system recognizes only still images in isolation. To bring it

to human level, it would have to model motion, color, and stereoscopic

depth perception. It would have a fovea and model saccades, for

example, scanning important visual features such as corners, faces,

words, and moving objects. It would have to be integrated with other

senses to aid recognition. For example, when you turn your head, the

model should predict how the image will change and extract features

from the residual errors. Vision makes heavy use of context. For

example, you can more easily recognize a co-worker at work than at the

store.

By adulthood we see the equivalent of 10^10 images at a frame rate of

around 10 per second. Each frame has 10^8 pixels, although to be fair,

this is reduced to 10^6 low-level features by the retina. A single

processor running at 10^10 OPS could easily do this. It is harder to

estimate the number of higher level features processed by the (much

larger) visual cortex, such as lines, edges, and movement, and then

going up the hierarchy, corners, letters, words, faces, and familiar

objects. The number of top level features would be at least as large

as our vocabulary, about 10^5, although it is probably much higher or

else we could adequately use words to convey pictures.

Google's system is trained on 10^11 bits. The optic nerve transmits

10^16 bits by adulthood, or 10^5 times as much. Coincidentally, our

brain has 10^5 times as many synapses (10^14) as Google's model. We

don't need 10^5 times as many processors because the computation is

spread out over decades, rather than 3 days. I estimate 10^6 cores at

10^9 to 10^10 OPS each.

Is it possible to solve the problem with less hardware? How?

--

-- Matt Mahoney, [email protected]

-------------------------------------------

AGI

Archives: https://www.listbox.com/member/archive/303/=now

RSS Feed: https://www.listbox.com/member/archive/rss/303/6952829-59a2eca5

Modify Your Subscription: https://www.listbox.com/member/?&;

Powered by Listbox: http://www.listbox.com 

-------------------------------------------

AGI

Archives: https://www.listbox.com/member/archive/303/=now

RSS Feed: https://www.listbox.com/member/archive/rss/303/10561250-470149cf

Modify Your Subscription: https://www.listbox.com/member/?&;

Powered by Listbox: http://www.listbox.com

      AGI | Archives

 | Modify
 Your Subscription

-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

RE: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

Reply via email to