Re: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

Matt Mahoney Fri, 05 Apr 2013 11:39:03 -0700

On Fri, Apr 5, 2013 at 1:16 PM, Mike Tintner <[email protected]> wrote:
> Matt:It seems like the next logical step would be to model a fovea and
> saccades to reduce the input complexity
>
> Care to expand? Are there any computational/robotic approaches to
> vision,which involve both the sensation/vision of a field by the retina AND
> the attention to objects/parts of objects within the field, of the fovea?


The retina does lossy image compression. You have about 10^8 light
sensing cells reduced to 10^6 features at the optic nerve. Each
feature represents a simple description of a region of the image, such
as light surrounded by dark, or dark surrounded by light, or light
next to dark with various orientations, or brightness increasing or
decreasing over time. The regions are small and tightly packed around
the fovea or center of vision, and get larger and more spread out as
you go outward to your peripheral vision.

Your brain recognizes images from these features, not directly from
the rods and cones. This is why you cannot read text or see fine
detail with your peripheral vision, even though the rods and cones are
just as densely packed throughout the retina. The processing by the
retina blurs them. It is not an optical blurring, however. A moving
point of light in your peripheral vision will still get your attention
because it activates the feature detectors in your retina that detect
motion.

To see all of a picture, you have to move your eyes around it. The
input to the higher level feature detectors is not just what you see,
but also feedback from the eye muscles that tell where you are
looking. To see a picture, you have to combine a lot of these fuzzy
images that focus on different locations, store them in short term
memory, and combine them all.

DeSTIN and Google's cat face recognizer don't do any of this. They
just process the whole image at once. It requires more computation
because you don't get the initial reduction of the image. Of course it
would end up being the same information if your eyes just scanned
across the image. But your eyes are smarter than that. You look at the
most important parts of the image first. Your eyes are attracted to
movement, regions of high contrast (edges and corners), and
interesting objects like human faces. When reading, your eyes jump
from one word to the next and your higher level feature detectors
recognize the word.

During saccades or eye movements, visual processing is turned off.
This is why you cannot see your own eyes move when you look from one
to the other in a mirror.


-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: Complexity of vision (was Re: [agi] Utilizing kickstarter.com?)

Reply via email to