I havent talked to Geoff in a few years. He is a classic machine learning person. The difficulty we would have conversing is the following. He wants mathematically derived and verifiable solutions. He cares less about how the brain actually does stuff. Many machine learning people look at the CLA and say, but you havent shown me mathematically how it works, without a mathematical proof you cant know if it is any good. The CLA wasnt derived that way. If you dont know the neuroscience and if you havent thought about the problems neocortex must solve, something like the CLA looks arbitrary.
What I hope happens is deep learning networks (hierarchical neural networks) will move from being spatial classifiers to spatial temporal classifiers and then our worlds will become one. The CLA is the best candidate to replace the restricted boltzman machines Hinton currently uses for each layer of his deep learning networks. Jeff From: nupic [mailto:[email protected]] On Behalf Of Fergal Byrne Sent: Tuesday, October 22, 2013 3:06 PM To: NuPIC general mailing list. Subject: Re: [nupic-dev] Looking for help in understanding part of the HTM white paper Great discussion Jeffs et al, The last few days I've been crash-coursing yet another phonetic Geoff (Hinton this time) on Coursera - highly recommended for everyone here to take. It's clear from GH's body language as well as from his work: he doesn't care whose idea is smarter than his, only what works. And he'd love it if the best-working ideas matched how our brains work. I see enormous synergies between what Jeff is doing and how Geoff's hierarchy and multi-layer architecture could leverage the CLA along the lines of what Geoff and his pals have worked on. You guys should go off for a dirty weekend. If you look at the architectures of how GH's multi-layer (multi-region in HTM) systems work, you're clearly looking at the kind of work done in the neocortex. He's interested in stripping out the complications, right down to the primitive mathematical interactions involved. I'd love to see how the CLA and GH's work could be integrated. It'll rock. Regards, Fergal Byrne On Tue, Oct 22, 2013 at 9:22 PM, Jeff Hawkins <[email protected]> wrote: We dont have any definitive numbers on this. In general the SP is tolerant to a large range of sparsity in the input, but the actual numbers depend on several things. On the sparser end it is important that there are enough active input bits for the SP recognize patterns. The more input bits you have the sparser the patterns can be and still have a sufficient number of active input bits. On the denser side of the scale I would expect the SP to start breaking down by 50% active input bits, maybe earlier. We have a method of determining if a trained SP is working well. Recall that the individual bits of the spatial pooler are trying to learn common spatial patterns in the input. We often refer to them as coincidence detectors. After training the SP you can look at how many valid synapses each SP coincidence detector has. The number of valid synapses tells you how rich a spatial pattern this coincidence detector has learned. For example 5 or fewer valid synapses is not much of a spatial pattern and the SP output will not be very stable. This represents a system that has too few SP coincidence detectors or a system where the input doesnt contain many repeating patterns. For example, if you feed random patterns into the SP it wont find anything to learn and the coincidence detectors will have few synapses. The more valid synapses you find in the trained SP coincidence detectors the better the job the SP is doing. If there are not enough total active input bits the SP wont find rich patterns. If the input is too dense then the input patterns will likely overlap a lot and the SP will have trouble separating them. Again, in practice we found the SP to be tolerant, I am just talking about the extremes. Advanced topic: Imagine the million bits coming from the retina, assume they have some reasonable sparse activity, say 5%. If we feed this into a plain vanilla spatial pooler it wont work well at all. Even if we have 1M columns in the SP, the number of patterns coming from the retina is so huge that each column in the SP will be overwhelmed. There are WAY more than 1M patterns coming from the retina, it will look like noise. However, we can fix this problem by using topology. When we implement the SP with topology the individual coincidence detectors will limit the area of the input they look at until they find rich spatial patterns. The size of the area they look at varies. If the input patterns become less varied the input area of a coincidence detector will expand. If the input patterns become more varied the input area will contract. This is the basis of plasticity. We tested this and it worked beautifully. Jeff From: nupic [mailto:[email protected]] On Behalf Of Pedro Tabacof Sent: Tuesday, October 22, 2013 4:30 AM To: NuPIC general mailing list. Subject: Re: [nupic-dev] Looking for help in understanding part of the HTM white paper Hello, Is there a recommended level for input sparsity? What is the minimum and maximum sparsity it can work functionally with? Thanks, Pedro. On Mon, Oct 21, 2013 at 6:28 PM, Jeff Hawkins <[email protected]> wrote: Perhaps this wasnt written as well as it should have been. The spatial pooler converts one sparse representation into another sparse representation. The output of a spatial pooler has a fixed number of bits (equal to the column number) and has a relatively fixed sparsity, say 2%. The spatial pooler works just fine with a range in the number of input bits and a range in sparsity. In some ways the goal of the SP is handle any amount of input, convert it to a fixed size and sparsity output. The other thing it does is learn the common spatial patterns in the input and make sure to represent those well. The output sparsity of the SP needs to be relatively fixed for the temporal pooler (sequence memory) to work. The number of output bits, equal to the number of columns, also has to be fixed for the TP to work. Why is it important that the input can vary? In a real brain the hierarchy of the neocortex is complicated and messy. Multiple regions converge onto destination regions as you ascend the hierarchy. By allowing the number of input bits to vary over a wide range, evolution could wire up the hierarchy lots of different ways and the cortex continues to work ok. If we took an existing brain and then added a connection between two regions that previously were not connected the SP in the destination region wouldnt break. For example, in normal humans the size of primary visual cortex varies by a factor of 3, but the size of the output of the retina is always about 1M fibers. The SP in V1 can handle a broad range in the ratio of the number of input bits and the number of output bits. The sparsity level of the input can vary due to multiple reasons. Lack of sensory input, change in attention (which effectively turns off input bits), and due to temporal pooling itself. So it is important that the spatial pooler take whatever it is given and converts it into a relatively fixed output. This is why the SP does what it does and why it is important. Do you need help understanding how the SP does this? Jeff From: nupic [mailto:[email protected]] On Behalf Of Jeff Fohl Sent: Sunday, October 20, 2013 6:41 PM To: [email protected] Subject: [nupic-dev] Looking for help in understanding part of the HTM white paper Hello - I hope this is not being posted to the wrong list. This is my first post here. Please let me know if there is a more appropriate place for this question. In preparation for learning NuPIC, I have read "On Intelligence", and I am now reading the HTM white paper put out by Numenta. Making my way through the white paper, I got stuck on one passage, which I can't really make sense of. Wondering if anyone can help me through this part. The passage in question is on pages 11-12 of the white paper PDF - specifically the second paragraph included below. HTM regions also use sparse distributed representations. In fact, the memory mechanisms within an HTM region are dependent on using sparse distributed representations, and wouldnt work otherwise. The input to an HTM region is always a distributed representation, but it may not be sparse, so the first thing an HTM region does is to convert its input into a sparse distributed representation. For example, a region might receive 20,000 input bits. The percentage of input bits that are 1 and 0 might vary significantly over time. One time there might be 5,000 1 bits and another time there might be 9,000 1 bits. The HTM region could convert this input into an internal representation of 10,000 bits of which 2%, or 200, are active at once, regardless of how many of the input bits are 1. As the input to the HTM region varies over time, the internal representation also will change, but there always will be about 200 bits out of 10,000 active. So, what exactly is going on here? How does a fluctuating input flow of 20,000 bits get converted into 200 bits? Obviously there is something important going on here, but I don't understand what it is. Any help illuminating this would be greatly appreciated! Many thanks, Jeff _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org -- Pedro Tabacof, Unicamp - Eng. de Computação 08. _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org -- Fergal Byrne Brenter IT [email protected] +353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie <http://www.adnet.ie/>
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
