Sorry I have been a little absent on this list. I was travelling this week and I am preparing for OsCON next week so I can't keep up with all the conversations.
Most image classification systems rely on some form of what we call "temporal pooling". (Mike described it well below.) E.g. HMAX, a vision system out of Poggio's lab at MIT uses a hard-coded pooling mechanism. They take their spatial features and hard code representations that are active for spatial shifts of the feature. Hard coded pooling works OK for the first level of a vision hierarchy but it doesn't work in a general sense. For example, in audition we need to pool patterns in time that have no obvious spatial invariance. We might want to pool successive notes in a melody and there is no equivalent of spatial invariance for that. Therefore, a cortical region must learn what patterns to pool over time. We did some vision work prior to the CLA. These algorithms did not have a good temporal model and we actually used hard coded pooling ala HMAX. I was never happy about this although it produce ok but not great results. When we first created the CLA and were using it for vision experiments we spent a lot of time making sure it could learn temporal pooling. The idea is you first learn a sequence, but this on its own doesn't do any pooling. To pool you need cells to stay active over a sequence of patterns. The way we achieved this is a cell first learns to predict its activity for one step ahead on time. But once it has learned to do that it can learn to predict its activity for two steps ahead, etc. By repeating patterns a cell can learn to predict its activity well in advance. How far in advance depends on how predictable and varied the sequences are. I don't have time to go into all the details now, but as Mike suggests, if we have only one cell per column then the cell will pool no matter what direction a pattern is moving. It can't tell a left moving line from a right moving line. Therefore it will produce a cell that responds to a line no matter where the line is and no matter what direction it is moving. However, if we have multiple cells per column then it will produce a cell that responds when a line is moving in a particular direction. We see both types of cell in V1 in real brains. I have a theory (a highly speculative theory) that Layer 4 cells are like the former and Layer 3 cells are like the latter. There are several lines of evidence to suggest this. In this case Layer 4 learns pure shift invariance but layer 3 learns true sequences. BTW, layer 4 is large in the first couple of levels in cortex but disappears as you ascend the hierarchy. My explanation is as you ascend the hierarchy spatial invariance is solved and is no longer needed. But sequences like melodies, language and actions continue to need the type of pooling done by layer 3. We got pooling to work in the CLA but it took a lot of synapses and therefore memory and computation time. In the current form of the CLA we have sequence memory but the pooling part is deactivated. We don't need pooling for the types of problems we are applying Grok to. One of the reasons I am hesitant to work on vision problems is that the temporal pooling requirement is large. Consider this, the amount of cortex dedicated to low-level vision (areas V1 and V2) dwarfs the amount of cortex dedicated to language (Broca's and Wernicke's areas). Low level vision is much harder than language. Amazing. Jeff From: nupic [mailto:[email protected]] On Behalf Of Scott Purdy Sent: Wednesday, July 17, 2013 10:58 AM To: NuPIC general mailing list. Subject: Re: [nupic-dev] Training on Handwritten Digit Dataset using CLA I was wrong about that. I don't quite understand it well enough to give a proper response so I am going to see if Jeff can write it up. The explanation I got was that you can train a temporal model by moving the letter around the image. And then when you give it a test image, you expect it to predict the letter moving in different directions. The predicted cells are apparently useful as you move up the hierarchy. Time acts as a sort of supervisor for spatial invariants. But like I said, I am going to try to get someone to do a better explanation. There was quite a lot of vision work done that would be great to capture for you guys. On Wed, Jul 17, 2013 at 8:04 AM, Quinn Liu <[email protected]> wrote: Hi Michael and Scott, Thank you very much for your explanations. Michael's explanation implies that the Temporal Pooler greatly helps in spatial invariance learning of training data which I can see working. But for question 3 Scott has said "No need for TP. It won't help with spatial representations." I was hoping Scott you could expand on your answer to what you think about how SP and TP contribute to spatial invarience recognition. Best Regards, Quinn Liu On Mon, Jul 15, 2013 at 5:07 PM, Michael Ferrier <[email protected]> wrote: Hi Quinn, The older version of HTM would group together the spatial patterns that would tend to occur in close temporal sequence with one another, and produce the same output when it saw any of the spatial patterns within a given group. So, if a network were trained on visual input of digits zig-zagging through the visual field, then any individual visual feature (for example a vertical line) would come to be represented by a temporal group that responds when it is presented with a vertical line at any of many nearby locations, because in the training data, a vertical line is often seen moving from one location to another nearby location. In this way it would learn invariance to position. At the lowest level of the hierarchy it would learn invariance to position for individual small visual features, and at higher levels it would learn invariance for more complex and larger arrangements of features and whole visual objects. Invariance to other transformations like scale, rotation, etc. could also be learned this way given the appropriate training data. Like Scott said the old version of HTM worked very differently from CLA, but they both model the same basic principles (the CLA does so much more flexibly). Using a CLA region with one cell per column, a cell should become active when given a particular spatial pattern, but should become predictive when given any pattern that (during training) often occurs close by in temporal sequence to that spatial pattern. So, if a column's proximal segment represents the spatial pattern of a vertical line, then that column's cell should become predictive whenever a vertical line at any nearby position is presented, because during training a given vertical line is often followed by another nearby vertical line, since the training set is made up of animations of the visual objects smoothly zig-zagging around. And because a CLA region sends output from both its active and predictive cells, from the point of view of the next, higher region in the hierarchy, that cell is responding invariantly to any of a set of nearby vertical lines. This corresponds to how 'complex cells' respond in visual cortex. Does that make sense? -Mike _____________ Michael Ferrier Department of Cognitive, Linguistic and Psychological Sciences, Brown University [email protected] On Mon, Jul 15, 2013 at 4:23 PM, Scott Purdy <[email protected]> wrote: Quinn, the older HTM implementations were completely different algorithms and are now obsolete. On Mon, Jul 15, 2013 at 1:09 PM, Quinn Liu <[email protected]> wrote: Hi Michael, I had an additional question. In your reply you remarked that "while digit recognition was successfully modeled with the original version of HTM, that doesn't seem to be the case with CLA yet". I was wondering if you or anyone else could expand on this as I am unfamiliar with the original version of the HTM. Assuming that it is premature version of the current spatial and temporal learning algorithms how is it different? Thanks! Best Regards, Quinn Liu [email protected] On Mon, Jul 15, 2013 at 3:41 PM, Michael Ferrier <[email protected]> wrote: Hi Fergal, I completely agree that a visual object recognition system would greatly benefit from hierarchy. Causes in the world are hierarchical, and the brain uses hierarchy to learn and represent them. The successful vision models using the original implementation of HTM were also hierarchical. I was just saying that, as far as I know, this hasn't been done with CLA yet -- according to Jeff, in their vision experiments they were just beginning to expand beyond one layer when they stopped working on vision. I think that both temporal pooling (for invariance) and hierarchy are key to using CLA for visual recognition problems, but I don't know of anyone who has put all the pieces together yet to do visual recognition with CLA. -Mike _____________ Michael Ferrier Department of Cognitive, Linguistic and Psychological Sciences, Brown University [email protected] On Mon, Jul 15, 2013 at 11:44 AM, Fergal Byrne <[email protected]> wrote: Hi Michael, Handwritten characters are undoubtedly multi-component designs, which have evolved to connect with and trigger our ability to learn spatial, temporal and hierarchical patterns. We perceive the same characters even when loads of things change in fonts, and especially when reading different people's handwriting. We can fill in gaps and correct misspellings. So the learning and prediction must be several levels deep in hierarchy. In terms of bottom level mechanics, we use saccades to recognise and "delocalise" components such as characters, facial features, etc, in such a way as to allow this multi-level recognition (including a hierarchy of fixations - for strokes, junctions, topology, characters, letters, words, and even sentences). Speed-readers can saccade to read entire phrases and sentences at a time, allowing reading speeds of thousands of words per minute with better than 70% comprehension scores. With practice, I've been able to get scores in the 1-2000 wpm range. I can also read text in a mirror or upside-down at speeds approaching 50-60% of an average reader. These things could only be done using big, complex region hierarchies with vast volumes of (normal) reading practice. I would have predicted that a single layer CLA would struggle with this kind of data set, because it lacks the multi-level upward and downward structure which I feel this kind of performance requires. Regards, Fergal Byrne _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
