On Wed, Apr 3, 2013 at 12:11 PM, Ben Goertzel <[email protected]> wrote: >>> By using more efficient algorithms than the human brain does ... >> >> How do you know that such algorithms exist? How do you calculate the >> complexity? >> > > What matters is the average case complexity, relative to the > probability distributions characterizing the actual environments and > goals relevant to the AGI system... > > There is no good math for calculating this kind of complexity... > > So, we are relying in significant part on intuition here....
Turing's intuition was that computers were already fast enough to solve AI. This was before vacuum tube computers like ENIAC, so I presume he meant mechanical relays. Anyway, I would like opinions on the computational complexity of human vision. Specifically, how would you optimize Google's cat face recognizer and bring it up to human level? http://128.84.158.119/abs/1112.6209v3 Their current implementation is a 9 layer neural network with 10^9 connections. It was trained on 10^7 256x256 grayscale images for 3 days on 16,000 CPU cores. It is 15.8% accurate on ImageNet, 70% better than any other system. Presumably, humans would be able to recognize most of the images. Google's system recognizes only still images in isolation. To bring it to human level, it would have to model motion, color, and stereoscopic depth perception. It would have a fovea and model saccades, for example, scanning important visual features such as corners, faces, words, and moving objects. It would have to be integrated with other senses to aid recognition. For example, when you turn your head, the model should predict how the image will change and extract features from the residual errors. Vision makes heavy use of context. For example, you can more easily recognize a co-worker at work than at the store. By adulthood we see the equivalent of 10^10 images at a frame rate of around 10 per second. Each frame has 10^8 pixels, although to be fair, this is reduced to 10^6 low-level features by the retina. A single processor running at 10^10 OPS could easily do this. It is harder to estimate the number of higher level features processed by the (much larger) visual cortex, such as lines, edges, and movement, and then going up the hierarchy, corners, letters, words, faces, and familiar objects. The number of top level features would be at least as large as our vocabulary, about 10^5, although it is probably much higher or else we could adequately use words to convey pictures. Google's system is trained on 10^11 bits. The optic nerve transmits 10^16 bits by adulthood, or 10^5 times as much. Coincidentally, our brain has 10^5 times as many synapses (10^14) as Google's model. We don't need 10^5 times as many processors because the computation is spread out over decades, rather than 3 days. I estimate 10^6 cores at 10^9 to 10^10 OPS each. Is it possible to solve the problem with less hardware? How? -- -- Matt Mahoney, [email protected] ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
