Re: [nupic-discuss] Classifier inner workings

Matthew Taylor Wed, 22 Jan 2014 10:44:56 -0800

Allan, it's cool that you are doing this. You will probably get more
help tuning your work if you pushed it into a public github repo. Just
a suggestion.


Kind regards,
---------
Matt Taylor
OS Community Flag-Bearer
Numenta


On Wed, Jan 22, 2014 at 10:29 AM, Allan Inocêncio de Souza Costa
<[email protected]> wrote:
> Of course!
>
> Following is my model_params. My memory can handle about 512 columns in SP.
> Also, the pixels fields are setted in the end:
>
> MODEL_PARAMS = {
>     # Type of model that the rest of these parameters apply to.
>     'model': "CLA",
>
>     # Version that specifies the format of the config.
>     'version': 1,
>
>     # Intermediate variables used to compute fields in modelParams and also
>     # referenced from the control section.
>     'aggregationInfo': {   'days': 0,
>         'fields': [],
>         'hours': 0,
>         'microseconds': 0,
>         'milliseconds': 0,
>         'minutes': 0,
>         'months': 0,
>         'seconds': 0,
>         'weeks': 0,
>         'years': 0},
>
>     'predictAheadTime': None,
>
>     # Model parameter dictionary.
>     'modelParams': {
>         # The type of inference that this model will perform
>         'inferenceType': 'NontemporalClassification',
>
>         'sensorParams': {
>             # Sensor diagnostic output verbosity control;
>             # if > 0: sensor region will print out on screen what it's
> sensing
>             # at each step 0: silent; >=1: some info; >=2: more info;
>             # >=3: even more info (see compute() in
> py/regions/RecordSensor.py)
>             'verbosity' : 0,
>
>             # Example:
>             #     dsEncoderSchema = [
>             #       DeferredDictLookup('__field_name_encoder'),
>             #     ],
>             #
>             # (value generated from DS_ENCODER_SCHEMA)
>             'encoders': {
>                 u'label':     {
>                   'classifierOnly': True,
>                   'fieldname': u'label',
>                   'n': 121,
>                   'name': u'label',
>                   'type': 'ScalarEncoder',
>                   'minval':0,
>                   'maxval':9,
>                   'w': 21},
>             },
>
>             # A dictionary specifying the period for automatically-generated
>             # resets from a RecordSensor;
>             #
>             # None = disable automatically-generated resets (also disabled
> if
>             # all of the specified values evaluate to 0).
>             # Valid keys is the desired combination of the following:
>             #   days, hours, minutes, seconds, milliseconds, microseconds,
> weeks
>             #
>             # Example for 1.5 days: sensorAutoReset = dict(days=1,hours=12),
>             #
>             # (value generated from SENSOR_AUTO_RESET)
>             'sensorAutoReset' : None,
>         },
>
>         'spEnable': True,
>
>         'spParams': {
>             # SP diagnostic output verbosity control;
>             # 0: silent; >=1: some info; >=2: more info;
>             'spVerbosity' : 0,
>
>             'globalInhibition': 1,
>
>             # Number of cell columns in the cortical region (same number for
>             # SP and TP)
>             # (see also tpNCellsPerCol)
>             'columnCount': 256,
>
>             'inputWidth': 256,
>
>             # SP inhibition control (absolute value);
>             # Maximum number of active columns in the SP region's output
> (when
>             # there are more, the weaker ones are suppressed)
>             'numActivePerInhArea': 40,
>
>             'seed': 1956,
>
>             # coincInputPoolPct
>             # What percent of the columns's receptive field is available
>             # for potential synapses. At initialization time, we will
>             # choose coincInputPoolPct * (2*coincInputRadius+1)^2
>             'coincInputPoolPct': 0.5,
>
>             # The default connected threshold. Any synapse whose
>             # permanence value is above the connected threshold is
>             # a "connected synapse", meaning it can contribute to the
>             # cell's firing. Typical value is 0.10. Cells whose activity
>             # level before inhibition falls below minDutyCycleBeforeInh
>             # will have their own internal synPermConnectedCell
>             # threshold set below this default value.
>             # (This concept applies to both SP and TP and so 'cells'
>             # is correct here as opposed to 'columns')
>             'synPermConnected': 0.1,
>
>             'synPermActiveInc': 0.1,
>
>             'synPermInactiveDec': 0.01,
>
>             'randomSP': 0,
>         },
>
>         # Controls whether TP is enabled or disabled;
>         # TP is necessary for making temporal predictions, such as
> predicting
>         # the next inputs.  Without TP, the model is only capable of
>         # reconstructing missing sensor inputs (via SP).
>         'tpEnable' : False,
>
>         'tpParams': {
>             # TP diagnostic output verbosity control;
>             # 0: silent; [1..6]: increasing levels of verbosity
>             # (see verbosity in nta/trunk/py/nupic/research/TP.py and
> TP10X*.py)
>             'verbosity': 0,
>
>             # Number of cell columns in the cortical region (same number for
>             # SP and TP)
>             # (see also tpNCellsPerCol)
>             'columnCount': 2048,
>
>             # The number of cells (i.e., states), allocated per column.
>             'cellsPerColumn': 32,
>
>             'inputWidth': 2048,
>
>             'seed': 1960,
>
>             # Temporal Pooler implementation selector (see _getTPClass in
>             # CLARegion.py).
>             'temporalImp': 'cpp',
>
>             # New Synapse formation count
>             # NOTE: If None, use spNumActivePerInhArea
>             #
>             # TODO: need better explanation
>             'newSynapseCount': 20,
>
>             # Maximum number of synapses per segment
>             #  > 0 for fixed-size CLA
>             # -1 for non-fixed-size CLA
>             #
>             # TODO: for Ron: once the appropriate value is placed in TP
>             # constructor, see if we should eliminate this parameter from
>             # description.py.
>             'maxSynapsesPerSegment': 32,
>
>             # Maximum number of segments per cell
>             #  > 0 for fixed-size CLA
>             # -1 for non-fixed-size CLA
>             #
>             # TODO: for Ron: once the appropriate value is placed in TP
>             # constructor, see if we should eliminate this parameter from
>             # description.py.
>             'maxSegmentsPerCell': 128,
>
>             # Initial Permanence
>             # TODO: need better explanation
>             'initialPerm': 0.21,
>
>             # Permanence Increment
>             'permanenceInc': 0.1,
>
>             # Permanence Decrement
>             # If set to None, will automatically default to tpPermanenceInc
>             # value.
>             'permanenceDec' : 0.1,
>
>             'globalDecay': 0.0,
>
>             'maxAge': 0,
>
>             # Minimum number of active synapses for a segment to be
> considered
>             # during search for the best-matching segments.
>             # None=use default
>             # Replaces: tpMinThreshold
>             'minThreshold': 12,
>
>             # Segment activation threshold.
>             # A segment is active if it has >= tpSegmentActivationThreshold
>             # connected synapses that are active due to infActiveState
>             # None=use default
>             # Replaces: tpActivationThreshold
>             'activationThreshold': 16,
>
>             'outputType': 'normal',
>
>             # "Pay Attention Mode" length. This tells the TP how many new
>             # elements to append to the end of a learned sequence at a time.
>             # Smaller values are better for datasets with short sequences,
>             # higher values are better for datasets with long sequences.
>             'pamLength': 1,
>         },
>
>         'clParams': {
>             'regionName' : 'CLAClassifierRegion',
>
>             # Classifier diagnostic output verbosity control;
>             # 0: silent; [1..6]: increasing levels of verbosity
>             'clVerbosity' : 0,
>
>             # This controls how fast the classifier learns/forgets. Higher
> values
>             # make it adapt faster and forget older patterns faster.
>             'alpha': 0.001,
>
>             # This is set after the call to updateConfigFromSubConfig and is
>             # computed from the aggregationInfo and predictAheadTime.
>             'steps': '0',
>         },
>
>         'anomalyParams': {
>           u'anomalyCacheRecords': None,
>           u'autoDetectThreshold': None,
>           u'autoDetectWaitRecords': None
>         },
>
>         'trainSPNetOnlyIfRequested': False,
>     }
> }
>
> for i in range(0,784):
>     MODEL_PARAMS['modelParams']['sensorParams']['encoders']['pixel%d' % i] =
> {
>                   'fieldname': u'pixel%d' % i,
>                   'n': 121,
>                   'name': u'pixel%d' % i,
>                   'type': 'ScalarEncoder',
>                   'minval':0,
>                   'maxval':255,
>                   'w': 21}
>
>
> Best regards,
> Allan
>
>
> Em Quarta-feira, 22 de Janeiro de 2014 16:19, Pedro Tabacof
> <[email protected]> escreveu:
> It's odd that the SP runs out of memory. Could you post your code here?
>
>
> On Wed, Jan 22, 2014 at 4:15 PM, Allan Inocêncio de Souza Costa
> <[email protected]> wrote:
>
> Thanks for the reply, Pedro and Mark.
>
> @Pedro
> You're right, I'm not using SP or TP. I did tried to simply activate SP in
> model_params.py, but it soon ran out of memory (I'm using 8 GB), so I have
> to use few columns and the result does not get improved.
>
> @Mark
> I agree with point 3.
> About point 1: I think you're right about the classifier and I would like to
> know more details about how it is implemented, so if someone knows, please
> let me know.
> About point 2: the images are encoded in 1D arrays with 784 (28x28)
> features, so I do lost topological information. But it is still a high
> dimensional space in which the data shows good clustering, so that even the
> hyperplanes obtained by simple logistic regression are capable of
> classifying the digits with good accuracy (> 90%). That's why I would like
> to get more information about the classifier itself.
>
> Best regards,
> Allan
>
>
> Em Quarta-feira, 22 de Janeiro de 2014 15:28, Pedro Tabacof
> <[email protected]> escreveu:
> Marek's second point is of utmost importance for anyone doing image
> classification. It would be awesome if someone could make 2D topology easily
> available. Convolutional neural networks are so much better than regular
> neural networks for image classification.
>
>
>
> On Wed, Jan 22, 2014 at 3:18 PM, Marek Otahal <[email protected]> wrote:
>
> Hi Allan,
>
> that was maybe me, it's great someone is working on the MNIST here!
>
> 1/ I'm not 100% clear about the Classifier, but I think it's just a helper
> utility, unrelated to the HTM/CLA, so you've been testing performance of any
> algorithm the CLassifier implements (not CLA imho). So you'd want to create
> a CLA (with SP only) and place Classifier atop of it. The pipeline would
> look like: {MNIST-data[ith-example]} >>> CLA(without TP) >>>(you get SDR)
>>>> Classifier (add MNIST-label[ith-example]
>
> 2/ I assume the mnist dataset is created from 2D images of hadwritten digits
> -> and just simply put in 1D array (??)
> Then you'll lose lot of topological info passing it to the CLA just as is. I
> think this will require ressurection of the Image Encoders that take into
> account distance for neighborhood pixels (each pixel has 8 neighboring px),
> this is used in inhibition etc.
>
> 3/ You're probably overfitting, rather experiment with 80%/20% data split.
>
> Cheers, Mark
>
>
> On Wed, Jan 22, 2014 at 5:57 PM, Allan Inocêncio de Souza Costa
> <[email protected]> wrote:
>
>
> Hi,
>
> I read a question that someone else asked here, but I couldn't  find the
> question nor the answers (if any), so I will ask again, as I'm now working
> around with the classifier.
>
> I tried to apply the classifier to the task of handwritten recognition using
> the MNIST dataset. The best result I got was an overall accuracy of about
> 42% (by that I mean that after training the entire dataset, the proportion
> of right predictions from the first to the last training example was 42%),
> after playing a little with the encoders. Of course this is better than the
> expected 10% accuracy of a random picker algorithm, but it falls short of
> what is accomplished by other (linear) algorithms. For those interested, I
> attached a plot of the accuracy.
>
> So here comes the question: what are the inner workings of the classifier?
> I'm puzzled as it doesn't have a SP. Can someone help or point to some
> reading?
>
> Best regards,
> Allan
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
>
> --
> Marek Otahal :o)
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
>
> --
> Pedro Tabacof,
> Unicamp - Eng. de Computação 08.
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
>
> --
> Pedro Tabacof,
> Unicamp - Eng. de Computação 08.
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Classifier inner workings

Reply via email to