Matt: DeSTIN and Google's cat face recognizer don't do any of this. They
just process the whole image at once.
Thanks for reply. My impression - and what I was asking about - is that ALL
current approaches process the whole image at once, not just Ben's.
In which case, they miss the most important dimension of vision, which is
that it is active/selective - as well as passive/reflective. Both
unconscious and conscious minds choose together what to look at in a scene
(or a face). And there are always new ways and new things to look at and
notice in any scene - as the visual arts endlessly demonstrate.
---Original Message-----
From: Matt Mahoney
Sent: Friday, April 05, 2013 7:35 PM
To: AGI
Subject: Re: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)
On Fri, Apr 5, 2013 at 1:16 PM, Mike Tintner <[email protected]>
wrote:
Matt:It seems like the next logical step would be to model a fovea and
saccades to reduce the input complexity
Care to expand? Are there any computational/robotic approaches to
vision,which involve both the sensation/vision of a field by the retina
AND
the attention to objects/parts of objects within the field, of the fovea?
The retina does lossy image compression. You have about 10^8 light
sensing cells reduced to 10^6 features at the optic nerve. Each
feature represents a simple description of a region of the image, such
as light surrounded by dark, or dark surrounded by light, or light
next to dark with various orientations, or brightness increasing or
decreasing over time. The regions are small and tightly packed around
the fovea or center of vision, and get larger and more spread out as
you go outward to your peripheral vision.
Your brain recognizes images from these features, not directly from
the rods and cones. This is why you cannot read text or see fine
detail with your peripheral vision, even though the rods and cones are
just as densely packed throughout the retina. The processing by the
retina blurs them. It is not an optical blurring, however. A moving
point of light in your peripheral vision will still get your attention
because it activates the feature detectors in your retina that detect
motion.
To see all of a picture, you have to move your eyes around it. The
input to the higher level feature detectors is not just what you see,
but also feedback from the eye muscles that tell where you are
looking. To see a picture, you have to combine a lot of these fuzzy
images that focus on different locations, store them in short term
memory, and combine them all.
DeSTIN and Google's cat face recognizer don't do any of this. They
just process the whole image at once. It requires more computation
because you don't get the initial reduction of the image. Of course it
would end up being the same information if your eyes just scanned
across the image. But your eyes are smarter than that. You look at the
most important parts of the image first. Your eyes are attracted to
movement, regions of high contrast (edges and corners), and
interesting objects like human faces. When reading, your eyes jump
from one word to the next and your higher level feature detectors
recognize the word.
During saccades or eye movements, visual processing is turned off.
This is why you cannot see your own eyes move when you look from one
to the other in a mirror.
-- Matt Mahoney, [email protected]
-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/6952829-59a2eca5
Modify Your Subscription:
https://www.listbox.com/member/?&
Powered by Listbox: http://www.listbox.com
-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com