I did not read the original paper.  I see this as a pure extrapolation of
other neural networks.  There is nothing unexpected - or was there?

The problem is that neural networks are not able to recognize
cross-categorical features (like seeing eyes both in humans and in other
animals).  (This example may be too fussy because the paper discussed  an
untrained model that only sampled still images but I just wanted to find an
important example.)  Another example is that folds of cloth might look
like limbs and bodies and so they might be cross categorized (in another
sample).  But what happens to this kind of cross-categorization that a
neural network can produce?  The features could be confused as well as be
used to recognize a type of thing in an image.  I believe that types of
things that can be cross-categorized (and used to significantly detect
similarities and differences during recognition) will only tend to blur
those similarities and differences when done in a neural network. However,
I am not that familiar with neural networks.
Jim Bromer

On Wed, Jun 27, 2012 at 9:53 AM, Matt Mahoney <[email protected]>wrote:

> On Wed, Jun 27, 2012 at 2:09 AM, bfrs <[email protected]> wrote:
> > nytimes article on this paper:
> >
> https://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?_r=1
>
> Original paper here:
> http://arxiv.org/pdf/1112.6209v3.pdf
>
> To summarize, a 9 layer neural network with 10^9 connections is
> trained unsupervised for 3 days on 1000 16-core CPUs on 10^7 unlabeled
> 200x200 images, each a random frame from a different Youtube video.
> When the resulting top level neurons are examined, it turns out that
> there are detectors for (among other things) human faces, human
> bodies, and cats.
>
> It was not told to look for these things. This is just a compression
> problem. If you want to encode an image efficiently, then you do so by
> describing its high level features (e.g. a person holding a cat). The
> learning problem is to find a set of useful features, knowing nothing
> about the world or what these arrays of pixels might represent.
>
> It does not achieve human level accuracy, but is still better than
> anything else. The equivalent problem for human vision would be to
> train 10^13 synapses for a decade on 10^9 images of 10^8 pixels each.
>
> --
> -- Matt Mahoney, [email protected]
>
>
> -------------------------------------------
> AGI
> Archives: https://www.listbox.com/member/archive/303/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/303/10561250-164650b2
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-c97d2393
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-2484a968
Powered by Listbox: http://www.listbox.com

Reply via email to