Nick,

No such file or directory: '/Users/mtaylor/dev/mitri/data/User_2_dir_train.txt'

I'm missing the user "2" data files.
---------
Matt Taylor
OS Community Flag-Bearer
Numenta


On Mon, Dec 29, 2014 at 12:01 PM, Nicholas Mitri <[email protected]> wrote:
> Hey Matt,
>
> Please find attached a folder containing the main py file and a data folder 
> containing a sample of data for one of the users.
> Also attached is the swarm file, the data log (concatenation from all users) 
> that was used to create it, and the description file containing the model 
> parameters (I’ve modified this to search for better parameter choices than 
> those produced by swarming, in the current version I’ve disabled the SP to 
> feed a large encoder output directly to the TP).
>
> It should run with no issues. Please feel free to email me if it doesn’t run 
> out of the box.
>
> Note: the swarm chooses the adaptive encoder. In my testing, the better 
> choice is a scalar encoder with periodic bool set to True since the direction 
> values wrap around (min = 0, max = 16).
>
> Thanks,
> Nick
>
>
>
>> On Dec 29, 2014, at 9:24 PM, Matthew Taylor <[email protected]> wrote:
>>
>> Nicholas,
>>
>> I ran your code with the same results you had. If you have updated
>> code, please post it.
>>
>> After talking to Subutai about this, we think the input data you're
>> using to swarm over doesn't seem to represent the data in your
>> labelled matrix file. How are you generating the input CSV data file?
>> It could be that the swarming data is not close enough to the
>> real-world data within the matrix file. Swarming tries to give you the
>> model that is best for predicting the next field in the input data, so
>> it's important that the data the swarm uses closely represents the
>> data the model actually sees. I don't think this is the case here.
>>
>> One thing I'm trying right now is to pass the labeled data into each
>> model during the training phase 10 times instead of just once to
>> reinforce the patterns in each model. The program is running now, not
>> sure if it will help or not but I might as well try.
>>
>> ---------
>> Matt Taylor
>> OS Community Flag-Bearer
>> Numenta
>>
>>
>> On Mon, Dec 29, 2014 at 7:25 AM, Nicholas Mitri <[email protected]> wrote:
>>> Hey Matt, everyone,
>>>
>>> I debugged the code and managed to get some sensible results. HTM is doing a
>>> great job of learning sequences but performing very poorly at
>>> generalization. So while it can recognize a sequence it had learned with
>>> high accuracy, when it’s fed a test sequence that it’s never seen, its
>>> classification accuracy plummets. To be clear, classification here is
>>> performed by assigning an HTM region to each class and observing which
>>> region outputs the least anomaly score averaged along a test sequence.
>>>
>>> I’ve tried tweaking the encoder parameters to quantize the input with a
>>> lower resolution in the hope that similar inputs will be better pooled. That
>>> didn’t pan out. Also, changing encoder output length or number of columns is
>>> causing the HTM to output no predictions at times even with a non-empty
>>> active column list. I have little idea why that keeps happening.
>>>
>>> Any hints as to how to get HTM to better perform here? I’ve included HMM
>>> results for comparison. SVM results are all 95+%.
>>>
>>> Thank you,
>>> Nick
>>>
>>>
>>> HTM Results:
>>>
>>> Data = sequence of directions (8 discrete direction)
>>> Note on accuracy: M1/M2 is shown here to represent 2 performance metrics. M1
>>> is average anomaly, M2 is the sum of average anomaly normalized and
>>> prediction error normalized.
>>>
>>> Base training accuracy: 100 % at 2 training passes
>>>
>>> User Dependent: 56.25%/56.25%
>>>
>>> User Independent: N/A
>>>
>>> Mixed: 65.00 %/ 71.25%
>>>
>>>
>>> HMM (22-states) Results:
>>>
>>> Data = sequence of directions (16 discrete direction)
>>>
>>> Base training accuracy: 97.5%
>>>
>>> User Dependent: 76.25 %
>>>
>>> User Independent:  88.75 %
>>>
>>> Mixed: 88.75 %
>>>
>>>
>>> On Dec 11, 2014, at 7:16 PM, Matthew Taylor <[email protected]> wrote:
>>>
>>> Nicholas, can you paste a sample of the input data file?
>>>
>>> ---------
>>> Matt Taylor
>>> OS Community Flag-Bearer
>>> Numenta
>>>
>>> On Thu, Dec 11, 2014 at 7:50 AM, Nicholas Mitri <[email protected]> wrote:
>>>>
>>>> Hey all,
>>>>
>>>> I’m running into some trouble with using HTM for a gesture recognition
>>>> application and would appreciate some help.
>>>> First, the data is collected from 17 users performing 5 gestures of each
>>>> of 16 different gesture classes using motion sensors. The feature vector 
>>>> for
>>>> each sample is a sequence of discretized directions calculated using bezier
>>>> control points after curve fitting the gesture trace.
>>>>
>>>> For a baseline, I fed the data to 16 10-state HMMs for training and again
>>>> for testing. The classification accuracy achieved is 95.7%.
>>>>
>>>> For HTM, I created 16 CLA models using parameters from a medium swarm. I
>>>> ran the data through the models for training where each model is trained on
>>>> only 1 gesture class. For testing, I fed the same data again with learning
>>>> turned off and recorded the anomaly score (averaged across each sequence)
>>>> for each model. Classification was done by seeking the model with the
>>>> minimum anomaly score. Accuracy turned out to be a puzzling 0.0%!!
>>>>
>>>> Below is the relevant section of the code. I would appreciate any hints.
>>>> Thanks,
>>>> Nick
>>>>
>>>> def run_experiment():
>>>>    print "Running experiment..."
>>>>
>>>>    model = [0]*16
>>>>    for i in range(0, 16):
>>>>        model[i] = ModelFactory.create(model_params, logLevel=0)
>>>>        model[i].enableInference({"predictedField": FIELD_NAME})
>>>>
>>>>    with open(FILE_PATH, "rb") as f:
>>>>        csv_reader = csv.reader(f)
>>>>        data = []
>>>>        labels = []
>>>>        for row in csv_reader:
>>>>            r = [int(item) for item in row[:-1]]
>>>>            data.append(r)
>>>>            labels.append(int(row[-1]))
>>>>
>>>>        # data_train, data_test, labels_train, labels_test =
>>>> cross_validation.train_test_split(data, labels, test_size=0.4,
>>>> random_state=0)
>>>>        data_train = data
>>>>        data_test = data
>>>>        labels_train = labels
>>>>        labels_test = labels
>>>>
>>>>    for passes in range(0, TRAINING_PASSES):
>>>>        sample = 0
>>>>        for (ind, row) in enumerate(data_train):
>>>>            for r in row:
>>>>                value = int(r)
>>>>                result = model[labels_train[ind]].run({FIELD_NAME: value,
>>>> '_learning': True})
>>>>                prediction =
>>>> result.inferences["multiStepBestPredictions"][1]
>>>>                anomalyScore = result.inferences["anomalyScore"]
>>>>            model[labels[ind]].resetSequenceStates()
>>>>            sample += 1
>>>>            print "Processing training sample %i" % sample
>>>>            if sample == 100:
>>>>                break
>>>>
>>>>    sample = 0
>>>>    labels_predicted = []
>>>>    for row in data_test:
>>>>        anomaly = [0]*16
>>>>        for i in range(0, 16):
>>>>            model[i].resetSequenceStates()
>>>>            for r in row:
>>>>                value = int(r)
>>>>                result = model[i].run({FIELD_NAME: value, '_learning':
>>>> False})
>>>>                prediction =
>>>> result.inferences["multiStepBestPredictions"][1]
>>>>                anomalyScore = result.inferences["anomalyScore"]
>>>>                # print value, prediction, anomalyScore
>>>>                if value == int(prediction) and anomalyScore == 0:
>>>>                    # print "No prediction made"
>>>>                    anomalyScore = 1
>>>>                anomaly[i] += anomalyScore
>>>>            anomaly[i] /= len(row)
>>>>        sample += 1
>>>>        print "Processing testing sample %i" % sample
>>>>        labels_predicted.append(np.min(np.array(anomaly)))
>>>>        print anomaly, row[-1]
>>>>        if sample == 100:
>>>>            break
>>>>
>>>>    accuracy = np.sum(np.array(labels_predicted) ==
>>>> np.array(labels_test))*100.0/len(labels_test)
>>>>    print "Testing accuracy is %0.2f" % accuracy
>>>>
>>>>
>>>
>>>
>>
>
>

Reply via email to