Re: [nupic-dev] Training on Handwritten Digit Dataset using CLA

Quinn Liu Thu, 18 Jul 2013 21:04:44 -0700

Hi Scott, Michael and Jeff,

   Thank you very much for your explanations. I greatly appreciate them.


Best Regards,
Quinn Liu


On Wed, Jul 17, 2013 at 8:26 PM, Jeff Hawkins <[email protected]> wrote:

> Sorry I have been a little absent on this list.  I was travelling this
> week and I am preparing for OsCON next week so I can’t keep up with all the
> conversations.****
>
> ** **
>
> Most image classification systems rely on some form of what we call
> “temporal pooling”.  (Mike described it well below.)  E.g.  HMAX, a vision
> system out of Poggio’s lab at MIT uses a hard-coded pooling mechanism.
> They take their spatial features and hard code representations that are
> active for spatial shifts of the feature.   Hard coded pooling works OK for
> the first level of a vision hierarchy but it doesn’t work in a general
> sense.  For example, in audition we need to pool patterns in time that have
> no obvious spatial invariance.  We might want to pool successive notes in a
> melody and there is no equivalent of spatial invariance for that.
>   Therefore, a cortical region must *learn* what patterns to pool over
> time.****
>
> ** **
>
> We did some vision work prior to the CLA.  These algorithms did not have a
> good temporal model and we actually used hard coded pooling ala HMAX.  I
> was never happy about this although it produce ok but not great results.**
> **
>
> ** **
>
> When we first created the CLA and were using it for vision experiments we
> spent a lot of time making sure it could learn temporal pooling.  The idea
> is you first learn a sequence, but this on its own doesn’t do any pooling.
> To pool you need cells to stay active over a sequence of patterns.  The way
> we achieved this is a cell first learns to predict its activity for one
> step ahead on time.  But once it has learned to do that it can learn to
> predict its activity for two steps ahead, etc.  By repeating patterns a
> cell can learn to predict its activity well in advance.  How far in advance
> depends on how predictable and varied the sequences are.****
>
> ** **
>
> I don’t have time to go into all the details now, but as Mike suggests, if
> we have only one cell per column then the cell will pool no matter what
> direction a pattern is moving.  It can’t tell a left moving line from a
> right moving line.  Therefore it will produce a cell that responds to a
> line no matter where the line is and no matter what direction it is
> moving.  However, if we have multiple cells per column then it will produce
> a cell that responds when a line is moving in a particular direction.  We
> see both types of cell in V1 in real brains.  I have a theory (a highly
> speculative theory) that Layer 4 cells are like the former and Layer 3
> cells are like the latter.  There are several lines of evidence to suggest
> this.   In this case Layer 4 learns pure shift invariance but layer 3
> learns true sequences.  BTW, layer 4 is large in the first couple of levels
> in cortex but disappears as you ascend the hierarchy.  My explanation is as
> you ascend the hierarchy spatial invariance is solved and is no longer
> needed.  But sequences like melodies, language and actions continue to need
> the type of pooling done by layer 3.****
>
> ** **
>
> We got pooling to work in the CLA but it took a lot of synapses and
> therefore memory and computation time.  In the current form of the CLA we
> have sequence memory but the pooling part is deactivated.  We don’t need
> pooling for the types of problems we are applying Grok to.****
>
> ** **
>
> One of the reasons I am hesitant to work on vision problems is that the
> temporal pooling requirement is large.  Consider this, the amount of cortex
> dedicated to low-level vision (areas V1 and V2) dwarfs the amount of cortex
> dedicated to language (Broca’s and Wernicke’s areas).  Low level vision is
> much harder than language.  Amazing.****
>
> Jeff****
>
> ** **
>
> *From:* nupic [mailto:[email protected]] *On Behalf Of *Scott
> Purdy
> *Sent:* Wednesday, July 17, 2013 10:58 AM
> *To:* NuPIC general mailing list.
> *Subject:* Re: [nupic-dev] Training on Handwritten Digit Dataset using CLA
> ****
>
> ** **
>
> I was wrong about that. I don't quite understand it well enough to give a
> proper response so I am going to see if Jeff can write it up.****
>
> ** **
>
> The explanation I got was that you can train a temporal model by moving
> the letter around the image.  And then when you give it a test image, you
> expect it to predict the letter moving in different directions.  The
> predicted cells are apparently useful as you move up the hierarchy.  Time
> acts as a sort of supervisor for spatial invariants.****
>
> ** **
>
> But like I said, I am going to try to get someone to do a better
> explanation.  There was quite a lot of vision work done that would be great
> to capture for you guys.****
>
> ** **
>
> On Wed, Jul 17, 2013 at 8:04 AM, Quinn Liu <[email protected]> wrote:****
>
> Hi Michael and Scott,
>     Thank you very much for your explanations. Michael's explanation
> implies that the Temporal Pooler greatly helps in spatial invariance
> learning of training data which I can see working. ****
>
> ** **
>
> But for question 3 Scott has said "No need for TP. It won't help with
> spatial representations." I was hoping Scott you could expand on your
> answer to what you think about how SP and TP contribute to spatial
> invarience recognition. ****
>
> ** **
>
> Best Regards,****
>
> Quinn Liu****
>
> ** **
>
> On Mon, Jul 15, 2013 at 5:07 PM, Michael Ferrier <
> [email protected]> wrote:****
>
> Hi Quinn, ****
>
> ** **
>
> The older version of HTM would group together the spatial patterns that
> would tend to occur in close temporal sequence with one another, and
> produce the same output when it saw any of the spatial patterns within a
> given group. So, if a network were trained on visual input of digits
> zig-zagging through the visual field, then any individual visual feature
> (for example a vertical line) would come to be represented by a temporal
> group that responds when it is presented with a vertical line at any of
> many nearby locations, because in the training data, a vertical line is
> often seen moving from one location to another nearby location. In this way
> it would learn invariance to position. At the lowest level of the hierarchy
> it would learn invariance to position for individual small visual features,
> and at higher levels it would learn invariance for more complex and larger
> arrangements of features and whole visual objects. Invariance to other
> transformations like scale, rotation, etc. could also be learned this way
> given the appropriate training data.****
>
> ** **
>
> Like Scott said the old version of HTM worked very differently from CLA,
> but they both model the same basic principles (the CLA does so much more
> flexibly). Using a CLA region with one cell per column, a cell should
> become active when given a particular spatial pattern, but should become
> predictive when given any pattern that (during training) often occurs close
> by in temporal sequence to that spatial pattern. So, if a column's proximal
> segment represents the spatial pattern of a vertical line, then that
> column's cell should become predictive whenever a vertical line at any
> nearby position is presented, because during training a given vertical line
> is often followed by another nearby vertical line, since the training set
> is made up of animations of the visual objects smoothly zig-zagging around.
> ****
>
> ** **
>
> And because a CLA region sends output from both its active and predictive
> cells, from the point of view of the next, higher region in the hierarchy,
> that cell is responding invariantly to any of a set of nearby vertical
> lines. This corresponds to how 'complex cells' respond in visual cortex.**
> **
>
> ** **
>
> Does that make sense?****
>
> ** **
>
> -Mike****
>
>
> ****
>
> _____________
> Michael Ferrier
> Department of Cognitive, Linguistic and Psychological Sciences, Brown
> University
> [email protected]****
>
> ** **
>
> On Mon, Jul 15, 2013 at 4:23 PM, Scott Purdy <[email protected]> wrote:***
> *
>
> Quinn, the older HTM implementations were completely different algorithms
> and are now obsolete.****
>
> ** **
>
> On Mon, Jul 15, 2013 at 1:09 PM, Quinn Liu <[email protected]> wrote:****
>
> Hi Michael,****
>
>     I had an additional question. In your reply you remarked that "while
> digit recognition was successfully modeled with the original version of
> HTM, that doesn't seem to be the case with CLA yet". I was wondering if you
> or anyone else could expand on this as I am unfamiliar with the original
> version of the HTM. Assuming that it is premature version of the current
> spatial and temporal learning algorithms how is it different? Thanks!****
>
> ** **
>
> Best Regards,****
>
> Quinn Liu****
>
> ** **
>
> [email protected]****
>
> ** **
>
> On Mon, Jul 15, 2013 at 3:41 PM, Michael Ferrier <
> [email protected]> wrote:****
>
> Hi Fergal,****
>
> ** **
>
> I completely agree that a visual object recognition system would greatly
> benefit from hierarchy. Causes in the world are hierarchical, and the brain
> uses hierarchy to learn and represent them. The successful vision models
> using the original implementation of HTM were also hierarchical. I was just
> saying that, as far as I know, this hasn't been done with CLA yet --
> according to Jeff, in their vision experiments they were just beginning to
> expand beyond one layer when they stopped working on vision.****
>
> ** **
>
> I think that both temporal pooling (for invariance) and hierarchy are key
> to using CLA for visual recognition problems, but I don't know of anyone
> who has put all the pieces together yet to do visual recognition with CLA.
> ****
>
> ** **
>
> -Mike****
>
> ** **
>
> ** **
>
>
> ****
>
> _____________
> Michael Ferrier
> Department of Cognitive, Linguistic and Psychological Sciences, Brown
> University
> [email protected]****
>
> ** **
>
> On Mon, Jul 15, 2013 at 11:44 AM, Fergal Byrne <
> [email protected]> wrote:****
>
> ** **
>
> Hi Michael,****
>
> ** **
>
> Handwritten characters are undoubtedly multi-component designs, which have
> evolved to connect with and trigger our ability to learn spatial, temporal
> and hierarchical patterns. We perceive the same characters even when loads
> of things change in fonts, and especially when reading different people's
> handwriting. We can fill in gaps and correct misspellings. So the learning
> and prediction must be several levels deep in hierarchy.****
>
> ** **
>
> In terms of bottom level mechanics, we use saccades to recognise and
> "delocalise" components such as characters, facial features, etc, in such a
> way as to allow this multi-level recognition (including a hierarchy of
> fixations - for strokes, junctions, topology, characters, letters, words,
> and even sentences). ****
>
> ** **
>
> Speed-readers can saccade to read entire phrases and sentences at a time,
> allowing reading speeds of thousands of words per minute with better than
> 70% comprehension scores. With practice, I've been able to get scores in
> the 1-2000 wpm range. I can also read text in a mirror or upside-down at
> speeds approaching 50-60% of an average reader. These things could only be
> done using big, complex region hierarchies with vast volumes of (normal)
> reading practice.****
>
> ** **
>
> I would have predicted that a single layer CLA would struggle with this
> kind of data set, because it lacks the multi-level upward and downward
> structure which I feel this kind of performance requires.****
>
> ** **
>
> Regards,****
>
> ** **
>
> Fergal Byrne****
>
> ** **
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>
> ** **
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>
> ** **
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>
> ** **
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>
> ** **
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>
> ** **
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>
> ** **
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Training on Handwritten Digit Dataset using CLA

Reply via email to