On Fri, Apr 5, 2013 at 1:16 PM, Mike Tintner <[email protected]> wrote: > Matt:It seems like the next logical step would be to model a fovea and > saccades to reduce the input complexity > > Care to expand? Are there any computational/robotic approaches to > vision,which involve both the sensation/vision of a field by the retina AND > the attention to objects/parts of objects within the field, of the fovea?
The retina does lossy image compression. You have about 10^8 light sensing cells reduced to 10^6 features at the optic nerve. Each feature represents a simple description of a region of the image, such as light surrounded by dark, or dark surrounded by light, or light next to dark with various orientations, or brightness increasing or decreasing over time. The regions are small and tightly packed around the fovea or center of vision, and get larger and more spread out as you go outward to your peripheral vision. Your brain recognizes images from these features, not directly from the rods and cones. This is why you cannot read text or see fine detail with your peripheral vision, even though the rods and cones are just as densely packed throughout the retina. The processing by the retina blurs them. It is not an optical blurring, however. A moving point of light in your peripheral vision will still get your attention because it activates the feature detectors in your retina that detect motion. To see all of a picture, you have to move your eyes around it. The input to the higher level feature detectors is not just what you see, but also feedback from the eye muscles that tell where you are looking. To see a picture, you have to combine a lot of these fuzzy images that focus on different locations, store them in short term memory, and combine them all. DeSTIN and Google's cat face recognizer don't do any of this. They just process the whole image at once. It requires more computation because you don't get the initial reduction of the image. Of course it would end up being the same information if your eyes just scanned across the image. But your eyes are smarter than that. You look at the most important parts of the image first. Your eyes are attracted to movement, regions of high contrast (edges and corners), and interesting objects like human faces. When reading, your eyes jump from one word to the next and your higher level feature detectors recognize the word. During saccades or eye movements, visual processing is turned off. This is why you cannot see your own eyes move when you look from one to the other in a mirror. -- Matt Mahoney, [email protected] ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
