I did not read the original paper. I see this as a pure extrapolation of other neural networks. There is nothing unexpected - or was there?
The problem is that neural networks are not able to recognize cross-categorical features (like seeing eyes both in humans and in other animals). (This example may be too fussy because the paper discussed an untrained model that only sampled still images but I just wanted to find an important example.) Another example is that folds of cloth might look like limbs and bodies and so they might be cross categorized (in another sample). But what happens to this kind of cross-categorization that a neural network can produce? The features could be confused as well as be used to recognize a type of thing in an image. I believe that types of things that can be cross-categorized (and used to significantly detect similarities and differences during recognition) will only tend to blur those similarities and differences when done in a neural network. However, I am not that familiar with neural networks. Jim Bromer On Wed, Jun 27, 2012 at 9:53 AM, Matt Mahoney <[email protected]>wrote: > On Wed, Jun 27, 2012 at 2:09 AM, bfrs <[email protected]> wrote: > > nytimes article on this paper: > > > https://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?_r=1 > > Original paper here: > http://arxiv.org/pdf/1112.6209v3.pdf > > To summarize, a 9 layer neural network with 10^9 connections is > trained unsupervised for 3 days on 1000 16-core CPUs on 10^7 unlabeled > 200x200 images, each a random frame from a different Youtube video. > When the resulting top level neurons are examined, it turns out that > there are detectors for (among other things) human faces, human > bodies, and cats. > > It was not told to look for these things. This is just a compression > problem. If you want to encode an image efficiently, then you do so by > describing its high level features (e.g. a person holding a cat). The > learning problem is to find a set of useful features, knowing nothing > about the world or what these arrays of pixels might represent. > > It does not achieve human level accuracy, but is still better than > anything else. The equivalent problem for human vision would be to > train 10^13 synapses for a decade on 10^9 images of 10^8 pixels each. > > -- > -- Matt Mahoney, [email protected] > > > ------------------------------------------- > AGI > Archives: https://www.listbox.com/member/archive/303/=now > RSS Feed: https://www.listbox.com/member/archive/rss/303/10561250-164650b2 > Modify Your Subscription: > https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com > ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-c97d2393 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-2484a968 Powered by Listbox: http://www.listbox.com
