Thanks Jeff & Ian, Geoff has an ironic sense of humour about the whole mathematical proof thing. He says that there was about a decade where the last men standing in neural net research couldn't get a single paper published, and some of the journals had explicit standards ruling out neural nets. He says that if you can prove your algorithm converges you have a better chance of being judged, and, if you're really lucky, published. He's quite happy to get a proof by adding a load of restrictions to the design, then throwing them away to get useful work done.
He refers several times to motivations based on the brain, and his philosophy seems to be based on building something useful first, then seeing if making it like the brain makes it even better. There seems to me to be a lot of things worth looking at in his (and his pals') research. While there are huge differences between the CLA and a layer of Restricted Boltzmann Machine cells, they are both looking at a spatial binary field and creating a sparse distributed representation which detects features in the data. The RBM method is designed to sacrifice biological accuracy in favour of mathematical tractability, but the results are similar. As regards spatial-temporal RBM's, he shows (at the end of Lecture 7 in his course) a very impressive video made by Alex Graves which identifies the characters in cursive (joined-up) writing, in real time. The data is a sequence of images of the text as it is being written. The video evens shows what pixels in the input are being used in the decision. I think there are several important ideas in what Geoff Hinton is doing which we should pay attention to: His RBM's learn by feeding forward the data, attempting to reconstruct the data from the hidden layer, and then feeding forward the reconstruction as if it were data. The difference between the statistics of the two feedforward passes gives an error measure which changes the weights. This seems to be a very fast way to build feature detectors unsupervised. Like the CLA, the "weights" are incremented when the column and input bit are both on, but the RBM decrements the weight (a bit less) when the column and the reconstructed "data" coincide. The layer thus drifts away from wherever it "likes" and towards where the data is. You can stack RBM's and build a hierarchy very easily, just by treating the previous top layer as the input layer of the new RBM. The bidirectional connections give you both feedforward and generative functionality, allowing you to produce "perceptions" at the bottom just by setting the top cells to a learned pattern or "label." These deep belief nets (DBN's) can learn hierarchical feature patterns and classes on their own, and later you can add a "label module" near the top to connect up the classes you've already learned with the labels for those classes. The idea of using reconstruction to fine-tune the interlayer connections is a candidate for doing hierarchy in the HTM. Some of the researchers call it the "wake-sleep" cycle and believe something similar is going on in the brain. Some major differences between this stuff and the CLA: 1. We use binary values everywhere, DBN's use scalars everywhere (although the "neurons" activity can be binary-valued). 2. DBN's don't actually do sequences, you have to use (or incorporate) a special sort called a Recurrent Neural Net for that. CLA could be considered to be all about sequences. 3. Neurons in DBN's are really dense compared with columns in the CLA. A neuron in a DBN is a complete feature detector, so each feature is represented by the scalar "activity" on that single neuron. You need a bunch of columns in the CLA to represent a feature, and the representation is a mini-SDR of activity. 4. Interlayer connections in a DBN are bidirectional and symmetric. Going back to our earlier discussions about smart encoders and granny cells, I feel that there is a case for investigating what happens when you connect the CLA cells back to their input bits in a way analogous to the DBN method. Regards, Fergal Byrne On Wed, Oct 23, 2013 at 1:01 AM, Ian Danforth <[email protected]>wrote: > >> What I hope happens is “deep learning” networks (hierarchical neural >> networks) will move from being spatial classifiers to spatial temporal >> classifiers and then our worlds will become one. >> > > Done. e.g. > > > http://research.microsoft.com/en-us/um/people/dongyu/nips2009/papers/schrauwen-paper_hier_recurr_net.pdf > > http://ai.stanford.edu/~amaas/papers/pP11_maas.pdf > ... many many more. > > In fact you can get deep recurrent nets as a service now: > > http://www.ersatz1.com/ > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Fergal Byrne <http://www.examsupport.ie>Brenter IT [email protected] +353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
