Re: Troubleshooting poor HTM performance in gesture recognition

Nicholas Mitri Mon, 29 Dec 2014 09:43:40 -0800

Generalization in the machine learning sense i.e. to leverage prior knowledge 
to correctly recognize/classify a novel input. The encoder and SP have 
quantization functionalities that should in theory make HTM robust to noise and 
within-class variances. 
Unfortunately, in the experiments I’ve run so far, whenever HTM is fed a novel 
input, it can’t reliably classify it; as indicated by the accuracy rates in my 
previous post.


> On Dec 29, 2014, at 7:09 PM, Francisco Webber <[email protected]> wrote:
> 
> Ok, I now better understand how you encode the gestures. But still I think 
> that my argument is valid, that if you want to do generalization you need 
> semantics to generalize on.
> Maybe I didn’t understand well what kind of generalization you want to 
> achieve.
> 
> Francisco
> 
> On 29.12.2014, at 17:38, Nicholas Mitri <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Thanks for your comments Francisco, 
>> 
>> I should’ve explained better. Gestures here refer to the shapes drawn by the 
>> user’s hand as he/she moves a smartphone. The result is a flattened 3D trace 
>> whose trajectory is estimated using motion sensors. The feature vector of 
>> every gesture is subsequently a sequence of directions from one control 
>> vertex of the trace to the next. Think of it as a piece wise linear trace 
>> that’s represented by discretized direction e.g. the trace of ‘5’ is 
>> 2->3->0->3->2  if we’re using 4 directions and start at the top. 
>> 
>> That’s the data I’m working with so there’s very little semantic depth to 
>> consider. Encoders here are needed more for their quantization/pooling 
>> functionality than anything else. 
>> 
>> best,
>> Nick
>> 
>>> On Dec 29, 2014, at 6:23 PM, Francisco Webber <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hello Nick,
>>> What you are trying to do sounds very interesting. My guess is that the 
>>> poor generalization is due to the fact that there is not sufficient 
>>> semantics captured during the encoding step. As you might know we are 
>>> working in the domain of language processing where semantic depth of the 
>>> SDRs is key. 
>>> In your case the semantics of the system is defined by the way a (human) 
>>> body looks like and its degrees of freedom to move.
>>> What you should try to achieve is to capture some this semantic context in 
>>> your encoding process. The SDRs representing the body positions (or 
>>> movements) should be formed in a way that similar positions (gestures) have 
>>> similar SDRs (many overlapping points). The better you are able to realize 
>>> this encoding, the better the HTM will be able to generalize.
>>> In language processing, we were able to create classifiers that needed only 
>>> 4 example sentences like:
>>> 
>>> "Erwin Schrödinger is a physicist.”
>>> “Marie Curie is a physicist"
>>> “Niels Bohr is a physicist”
>>> “James Maxwell is a physicist”
>>> 
>>> to give the following response: “Albert Einstein is a” PHYSICIST
>>> 
>>> In my experience, measurable similarity among SDRs, encoded to represent 
>>> similar data, seems to be key for an HTM network to unfold its full power.
>>> 
>>> Francisco
>>> 
>>> On 29.12.2014, at 16:25, Nicholas Mitri <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> Hey Matt, everyone, 
>>>> 
>>>> I debugged the code and managed to get some sensible results. HTM is doing 
>>>> a great job of learning sequences but performing very poorly at 
>>>> generalization. So while it can recognize a sequence it had learned with 
>>>> high accuracy, when it’s fed a test sequence that it’s never seen, its 
>>>> classification accuracy plummets. To be clear, classification here is 
>>>> performed by assigning an HTM region to each class and observing which 
>>>> region outputs the least anomaly score averaged along a test sequence. 
>>>> 
>>>> I’ve tried tweaking the encoder parameters to quantize the input with a 
>>>> lower resolution in the hope that similar inputs will be better pooled. 
>>>> That didn’t pan out. Also, changing encoder output length or number of 
>>>> columns is causing the HTM to output no predictions at times even with a 
>>>> non-empty active column list. I have little idea why that keeps happening. 
>>>> 
>>>> Any hints as to how to get HTM to better perform here? I’ve included HMM 
>>>> results for comparison. SVM results are all 95+%.
>>>> 
>>>> Thank you,
>>>> Nick
>>>> 
>>>> 
>>>> HTM Results:
>>>> 
>>>> Data = sequence of directions (8 discrete direction)
>>>> Note on accuracy: M1/M2 is shown here to represent 2 performance metrics. 
>>>> M1 is average anomaly, M2 is the sum of average anomaly normalized and 
>>>> prediction error normalized.
>>>> 
>>>> Base training accuracy: 100 % at 2 training passes
>>>> User Dependent: 56.25%/56.25%
>>>> User Independent: N/A
>>>> Mixed: 65.00 %/ 71.25%
>>>> 
>>>> HMM (22-states) Results:
>>>> 
>>>> Data = sequence of directions (16 discrete direction)
>>>> 
>>>> Base training accuracy: 97.5%
>>>> User Dependent: 76.25 %
>>>> User Independent:  88.75 %
>>>> Mixed: 88.75 %
>>>> 
>>>> 
>>>>> On Dec 11, 2014, at 7:16 PM, Matthew Taylor <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Nicholas, can you paste a sample of the input data file?
>>>>> 
>>>>> ---------
>>>>> Matt Taylor
>>>>> OS Community Flag-Bearer
>>>>> Numenta
>>>>> 
>>>>> On Thu, Dec 11, 2014 at 7:50 AM, Nicholas Mitri <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hey all, 
>>>>> 
>>>>> I’m running into some trouble with using HTM for a gesture recognition 
>>>>> application and would appreciate some help. 
>>>>> First, the data is collected from 17 users performing 5 gestures of each 
>>>>> of 16 different gesture classes using motion sensors. The feature vector 
>>>>> for each sample is a sequence of discretized directions calculated using 
>>>>> bezier control points after curve fitting the gesture trace. 
>>>>> 
>>>>> For a baseline, I fed the data to 16 10-state HMMs for training and again 
>>>>> for testing. The classification accuracy achieved is 95.7%. 
>>>>> 
>>>>> For HTM, I created 16 CLA models using parameters from a medium swarm. I 
>>>>> ran the data through the models for training where each model is trained 
>>>>> on only 1 gesture class. For testing, I fed the same data again with 
>>>>> learning turned off and recorded the anomaly score (averaged across each 
>>>>> sequence) for each model. Classification was done by seeking the model 
>>>>> with the minimum anomaly score. Accuracy turned out to be a puzzling 
>>>>> 0.0%!!
>>>>> 
>>>>> Below is the relevant section of the code. I would appreciate any hints. 
>>>>> Thanks,
>>>>> Nick
>>>>> 
>>>>> def run_experiment():
>>>>>     print "Running experiment..."
>>>>> 
>>>>>     model = [0]*16
>>>>>     for i in range(0, 16):
>>>>>         model[i] = ModelFactory.create(model_params, logLevel=0)
>>>>>         model[i].enableInference({"predictedField": FIELD_NAME})
>>>>> 
>>>>>     with open(FILE_PATH, "rb") as f:
>>>>>         csv_reader = csv.reader(f)
>>>>>         data = []
>>>>>         labels = []
>>>>>         for row in csv_reader:
>>>>>             r = [int(item) for item in row[:-1]]
>>>>>             data.append(r)
>>>>>             labels.append(int(row[-1]))
>>>>> 
>>>>>         # data_train, data_test, labels_train, labels_test = 
>>>>> cross_validation.train_test_split(data, labels, test_size=0.4, 
>>>>> random_state=0)
>>>>>         data_train = data
>>>>>         data_test = data
>>>>>         labels_train = labels
>>>>>         labels_test = labels
>>>>> 
>>>>>     for passes in range(0, TRAINING_PASSES):
>>>>>         sample = 0
>>>>>         for (ind, row) in enumerate(data_train):
>>>>>             for r in row:
>>>>>                 value = int(r)
>>>>>                 result = model[labels_train[ind]].run({FIELD_NAME: value, 
>>>>> '_learning': True})
>>>>>                 prediction = 
>>>>> result.inferences["multiStepBestPredictions"][1]
>>>>>                 anomalyScore = result.inferences["anomalyScore"]
>>>>>             model[labels[ind]].resetSequenceStates()
>>>>>             sample += 1
>>>>>             print "Processing training sample %i" % sample
>>>>>             if sample == 100:
>>>>>                 break
>>>>> 
>>>>>     sample = 0
>>>>>     labels_predicted = []
>>>>>     for row in data_test:
>>>>>         anomaly = [0]*16
>>>>>         for i in range(0, 16):
>>>>>             model[i].resetSequenceStates()
>>>>>             for r in row:
>>>>>                 value = int(r)
>>>>>                 result = model[i].run({FIELD_NAME: value, '_learning': 
>>>>> False})
>>>>>                 prediction = 
>>>>> result.inferences["multiStepBestPredictions"][1]
>>>>>                 anomalyScore = result.inferences["anomalyScore"]
>>>>>                 # print value, prediction, anomalyScore
>>>>>                 if value == int(prediction) and anomalyScore == 0:
>>>>>                     # print "No prediction made"
>>>>>                     anomalyScore = 1
>>>>>                 anomaly[i] += anomalyScore
>>>>>             anomaly[i] /= len(row)
>>>>>         sample += 1
>>>>>         print "Processing testing sample %i" % sample
>>>>>         labels_predicted.append(np.min(np.array(anomaly)))
>>>>>         print anomaly, row[-1]
>>>>>         if sample == 100:
>>>>>             break
>>>>> 
>>>>>     accuracy = np.sum(np.array(labels_predicted) == 
>>>>> np.array(labels_test))*100.0/len(labels_test)
>>>>>     print "Testing accuracy is %0.2f" % accuracy
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Troubleshooting poor HTM performance in gesture recognition

Reply via email to