Thanks Jeff & Ian,

Geoff has an ironic sense of humour about the whole mathematical proof
thing. He says that there was about a decade where the last men standing in
neural net research couldn't get a single paper published, and some of the
journals had explicit standards ruling out neural nets. He says that if you
can prove your algorithm converges you have a better chance of being
judged, and, if you're really lucky, published. He's quite happy to get a
proof by adding a load of restrictions to the design, then throwing them
away to get useful work done.

He refers several times to motivations based on the brain, and his
philosophy seems to be based on building something useful first, then
seeing if making it like the brain makes it even better.

There seems to me to be a lot of things worth looking at in his (and his
pals') research. While there are huge differences between the CLA and a
layer of Restricted Boltzmann Machine cells, they are both looking at a
spatial binary field and creating a sparse distributed representation which
detects features in the data. The RBM method is designed to sacrifice
biological accuracy in favour of mathematical tractability, but the results
are similar.

As regards spatial-temporal RBM's, he shows (at the end of Lecture 7 in his
course) a very impressive video made by Alex Graves which identifies the
characters in cursive (joined-up) writing, in real time. The data is a
sequence of images of the text as it is being written. The video evens
shows what pixels in the input are being used in the decision.

I think there are several important ideas in what Geoff Hinton is doing
which we should pay attention to:

His RBM's learn by feeding forward the data, attempting to reconstruct the
data from the hidden layer, and then feeding forward the reconstruction as
if it were data. The difference between the statistics of the two
feedforward passes gives an error measure which changes the weights. This
seems to be a very fast way to build feature detectors unsupervised.

Like the CLA, the "weights" are incremented when the column and input bit
are both on, but the RBM decrements the weight (a bit less) when the column
and the reconstructed "data" coincide. The layer thus drifts away from
wherever it "likes" and towards where the data is.

You can stack RBM's and build a hierarchy very easily, just by treating the
previous top layer as the input layer of the new RBM. The bidirectional
connections give you both feedforward and generative functionality,
allowing you to produce "perceptions" at the bottom just by setting the top
cells to a learned pattern or "label."

These deep belief nets (DBN's) can learn hierarchical feature patterns and
classes on their own, and later you can add a "label module" near the top
to connect up the classes you've already learned with the labels for those
classes.

The idea of using reconstruction to fine-tune the interlayer connections is
a candidate for doing hierarchy in the HTM. Some of the researchers call it
the "wake-sleep" cycle and believe something similar is going on in the
brain.

Some major differences between this stuff and the CLA:

1. We use binary values everywhere, DBN's use scalars everywhere (although
the "neurons" activity can be binary-valued).
2. DBN's don't actually do sequences, you have to use (or incorporate) a
special sort called a Recurrent Neural Net for that. CLA could be
considered to be all about sequences.
3. Neurons in DBN's are really dense compared with columns in the CLA. A
neuron in a DBN is a complete feature detector, so each feature is
represented by the scalar "activity" on that single neuron. You need a
bunch of columns in the CLA to represent a feature, and the representation
is a mini-SDR of activity.
4. Interlayer connections in a DBN are bidirectional and symmetric.

Going back to our earlier discussions about smart encoders and granny
cells, I feel that there is a case for investigating what happens when you
connect the CLA cells back to their input bits in a way analogous to the
DBN method.

Regards,

Fergal Byrne





On Wed, Oct 23, 2013 at 1:01 AM, Ian Danforth <[email protected]>wrote:

>
>> What I hope happens is “deep learning” networks (hierarchical neural
>> networks) will move from being spatial classifiers to spatial temporal
>> classifiers and then our worlds will become one.
>>
>
> Done. e.g.
>
>
> http://research.microsoft.com/en-us/um/people/dongyu/nips2009/papers/schrauwen-paper_hier_recurr_net.pdf
>
> http://ai.stanford.edu/~amaas/papers/pP11_maas.pdf
> ... many many more.
>
> In fact you can get deep recurrent nets as a service now:
>
> http://www.ersatz1.com/
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 

Fergal Byrne

<http://www.examsupport.ie>Brenter IT
[email protected] +353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to