The code is available at github.com/baroobob/nupic.vision

Today I made data sets for 80/20 and 60/40 splits between training and
testing, I'll post the results here when I have them.

On Thu, Aug 21, 2014 at 8:47 AM, Subutai Ahmad <[email protected]> wrote:
>
> Hi Jim,
>
> Thanks for doing this study. I echo Pedro's comments and questions - we need
> more investigations like this.
>
> In terms of future experiments, it would be great to see a result with a
> typical 80-20 split, i.e. 80% of the data for training and 20% for testing.
> Ideally it would be nice to see performance with random added noise or other
> distortions.  Comparing against straight KNN for all of this is a good
> standard thing to do (one of my pet peeves in machine learning is that
> people don't do this enough. The KNN is provably almost optimal for large
> data sets - see the original Duda and Hart book!).
>
> My general expectation here is that the SP needs one or more thousand images
> to start doing a decent job. It probably wouldn't do hugely better or worse
> than KNN, though it should do better once you start adding noise, dropouts,
> etc.  The SP won't learn too many invariances.   In general I don't expect
> it do much better than standard spatial techniques on purely spatial tasks.
> The main job of the SP is to create a decent SDR for temporal memory and
> temporal pooling.
>
> Parameter tuning is admittedly difficult here. The specific numbers matter
> and you have to develop some intuitions. Unfortunately you can't directly
> apply our swarm - that is currently tied to the OPF and temporal datasets.
> It would be nice to improve that code to make it more general - it could
> help a lot in these kinds of investigations.
>
> Overall I am glad you started this project! Thank you for keeping us
> informed throughout, and taking it all the way to a good writeup. Is your
> code available somewhere in case others want to try additional experiments?
>
> --Subutai
>
>
>
> On Tue, Aug 19, 2014 at 6:52 PM, Jim Bridgewater <[email protected]> wrote:
>>
>> Hi Pedro,
>>
>> Thank you for the feedback.
>>
>> 1.  Previously I have run it using a randomly initialized SP with
>> learning disabled and got results comparable to those with learning
>> turned on which emphasizes that the spatial pooler as configured is
>> not generalizing particularly well.  I never tried sending the bit
>> vectors directly to the classifier, but since you recommended it I
>> made a pass through version of the SP which simply copies the input to
>> the output and this produces better results than those using the real
>> SP (as I have it configured)!
>>
>> 2. I haven't and in terms of the images it's actually 100% or 0%, but
>> in terms of the characters the images represent (ground truth) it's
>> always 100% which was my rational for using the small training data
>> set since there are only 62 characters in both data sets (0-9, A-Z,
>> a-z).  I have run a small case where I train on 62 images (normal
>> font) and test on 124 (normal and bold fonts) and I get around 80%
>> accuracy which seems a bit low for what amounts to a pretty simple
>> generalization task.
>>
>> 3. I am aware of MNIST, but I wanted to focus more on machine printed
>> characters for document recognition.  That coupled with the fact that
>> when I was looking for data sets I did not find a place where MNIST
>> was freely available was enough to keep me from using it.
>>
>> 4. I started with the parameters in Ian's sp_viewer demo, ran a few
>> simple parameter searches to get a feel for how increment and
>> decrement values affected the SP, and got some advice from Subutai on
>> the mailing list.  These parameters are probably not optimal.
>>
>>
>> How well do you guess an optimized SP can do on tasks like these?
>>
>> On Mon, Aug 18, 2014 at 8:45 PM, Pedro Tabacof <[email protected]> wrote:
>> > Hello Jim,
>> >
>> > Thank you for your work and report, we need more investigations like
>> > yours.
>> > A few suggestions:
>> >
>> > Since you're using a KNN classifier, it'd be nice to use it directly on
>> > the
>> > pixels as a baseline. It's an important benchmark to show that NuPIC
>> > indeed
>> > is doing the heavy work.
>> > Have you tried a more balanced division between training and testing
>> > sets?
>> > Using 100% or 1% of the data to train seems a bit to extreme to me.
>> > Did you look at the MNIST dataset? It's probably the most widely used
>> > benchmark for computer vision. It's gonna be computationally demanding
>> > (50-60K images), but we will have results that can be compared to other
>> > machine learning approaches.
>> > Did you use swarming or grid search to find out the best
>> > meta-parameters?
>> >
>> > A long time ago I used the previous NuPIC implementation for static
>> > classification (just the spatial pooler) and it was competitive with
>> > SVMs.
>> >
>> > Pedro.
>> >
>> >
>> > On Tue, Aug 19, 2014 at 12:24 AM, Jim Bridgewater <[email protected]>
>> > wrote:
>> >>
>> >> Hi everyone,
>> >>
>> >> I've written up a summary of the work I did this summer as part of
>> >> Season of NuPIC that includes the most recent results.  This summary
>> >> is attached along with a separate file that contains 8,928 images from
>> >> 144 fonts.  These images were used to test the spatial pooler.  The
>> >> gist of it is that the SP does very well (>97% accuracy) when you
>> >> train it on all of the images you test it on which is good, but very
>> >> time consuming and doesn't require any ability to generalize.  When I
>> >> trained the SP on a much smaller data set of 186 images containing
>> >> normal, bold, and italic characters not included in the larger data
>> >> set the accuracy fell to about 32%.  There are several ways to improve
>> >> this.  One is reducing the potential radius so columns learn features
>> >> rather than entire characters.  I tried this, but there appears to be
>> >> a bug in the SP's potential mapping that currently prevents this
>> >> technique from helping.  Another way is to try different potential
>> >> mappings, like lines with different orientations, again in an effort
>> >> to get the SP's columns to learn features rather than entire
>> >> characters.  I've written a mapping for this but have not tried it.
>> >> And yet another way to improve these results would be to add
>> >> additional SP regions in an effort to get more generalization.
>> >>
>> >> I look forward to hearing your comments!
>> >>
>> >> --
>> >> James Bridgewater, PhD
>> >> Arizona State University
>> >> 480-227-9592
>> >>
>> >> _______________________________________________
>> >> nupic mailing list
>> >> [email protected]
>> >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >>
>> >
>> >
>> >
>> > --
>> > Pedro Tabacof
>> >
>> > _______________________________________________
>> > nupic mailing list
>> > [email protected]
>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >
>>
>>
>>
>> --
>> James Bridgewater, PhD
>> Arizona State University
>> 480-227-9592
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>



-- 
James Bridgewater, PhD
Arizona State University
480-227-9592


Reply via email to