Re: Advice sought re inference types

Matt Harvey Sun, 17 May 2015 09:09:07 -0700

Hi Matt,

Thanks for your reply.


>Sounds like you have some labeled data you want to train NuPIC with,
>where S1 is the raw data and S2 are the feature labels. The intent is
>to train NuPIC on labeled data, and then pass in unlabeled data,
>having NuPIC infer the categorization. Is that right?

Yes exactly. The only thing to note is that the data in S1 is itself a set
of labels (non-numeric discrete) that form a  non-temporal sequence.

> So, I'm not sure that this will work

It didn't - as you expected, it seems to have trained on S2 alone:

'sensorParams': { 'encoders': { '_classifierInput': {
'classifierOnly':True,
'fieldname':'S2',
'n':121,
'name':'_classifierInput',
'type':'SDRCategoryEncoder',
'w': 21},
u'S1': {...},
u'S2': {...}

Have I interpreted that correctly?


I'm trying your suggestion of training different models for each feature
now. Is there anything published on the EEG  classifier you mention?

Cheers,

Matt

On 16 May 2015 at 19:10, Matthew Taylor <m...@numenta.org> wrote:

> Hi Matt,
>
> Sounds like you have some labeled data you want to train NuPIC with,
> where S1 is the raw data and S2 are the feature labels. The intent is
> to train NuPIC on labeled data, and then pass in unlabeled data,
> having NuPIC infer the categorization. Is that right?
>
> So, I'm not sure that this will work. NuPIC may not be learning the
> patterns within S1 that represent the feature labels in S2. It could
> be learning the pattern of S2 itself over time. When you get the swarm
> results, pay close attention to which field is given an encoder in the
> model params it creates. If it encodes S1 and not S2, that is a good
> sign. If it only encodes S2, that means it didn't pick up a direct
> correlation between S1 and S2, and it will only be paying attention to
> S2 patterns. You don't want this. If it encodes both fields, it is
> making inferences based on both input fields.
>
> Another tactic that has been used in the past is to use anomaly values
> for categorization. This is a more complex process for several
> reasons. For this method, you should need to pre-process your data
> into several data sets, each representing one feature. This could be
> tricky, especially if your data is timestamped, and the timestamp is
> important to feature categorization. Then you train one NuPIC model
> for each feature, so that each model learns the patterns of one
> feature only. This will likely require swarming only once, since the
> input data (no matter the feature) pretty much has the same
> characteristics. You can use the same model params for each model
> you're training. Just send different feature sets into each one.
> You'll end up with a set of models that have learned the patterns of
> each feature, and are ignorant of the other features. Theoretically,
> each model will return lower anomaly scores if it sees data over time
> that fits inside the feature patterns it has learned during training.
>
> For this method, you would not need S2 at all. The features would
> instead be "labelled" by training different NuPIC models on different
> features. Once your models are trained, you would pass real data into
> each model, each input data row being processed simultaneously by each
> feature-detecting model. Each model would have an inference type of
> "TemporalAnomaly", and the model results will contain an
> "anomaly_score". The model with the lowest anomaly score would be the
> winner, thus classifying the input data at that point in time.
>
> This tactic has worked in the past for classifying EEG data, but not a
> lot of further work as been done with it. Keep in mind that your
> features must be discrete and known ahead of time for this to work, so
> it may not work for you. However, if new features are identified over
> time, new models could be trained and added to the ensemble at a later
> time, thus expanding the possible categories at the expense of memory
> and CPU.
>
> I hope this helps,
>
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
>
> On Sat, May 16, 2015 at 7:32 AM, Matt Harvey <m.j.har...@acellera.com>
> wrote:
> > Hi,
> >
> >
> > I'm just starting out with nupic, nd would really appreciate a pointer or
> > two on which inference model is best for my task.
> >
> > I'm trying to train a model to to perform a specific feature detection
> task
> > on discrete data. My Input data is:
> >
> > * S1 a set of N discrete category values representing features derived
> from
> > raw spatial-domain data
> > * S2 a set of N feature categorisations of S1 sequences, where element
> S2_i
> > can -- broadly speaking -- be determined from { S1_(i-n) ... S1_(i+n) }
> > (although exact value of n is unknown)
> >
> > I would like to train a model to learn to perform the S1->S2 feature
> > categorisation. For a given sequence S1, the model should give a best
> guess
> > at the likely feature for each element.
> >
> > My initial attempt, based on the one_gym and opf examples, is to treat
> the
> > data as a temporal sequence and train a TemporalMultiStep on each
> > {S1_i,S2_i} tuple (using the "string" type for the data).
> >
> > I don't know how successful this is going to be (still swarming...) ,
> but I
> > can already see some problems. My questions are:
> >
> > * Is TemporalMultiStep the right inference type for this? (I don't need
> > prediction at future i, just an inference of S2_i). Would
> TemporalClassifier
> > be more appropriate?
> >
> > * In the case where all of the S1 set is known in advance, treating the
> data
> > as temporal input means that the model is unable to learn from 'future'
> > input, which is certainly going to make the feature learning harder.
> (Though
> > I could combine results inference from running S1 both forwards and
> > backwards through the model?)
> >
> > Would it instead be better to try a classification of intervals over S1,
> eg
> > giving as inputs { {S1_i-n .. S1_i+n}, S2_i } ?. In that case, would it
> be
> > better to treat this with a non-temporal model? (NontemporalClassifier?)
> >
> > I'd be most grateful any opinions or advice.
> >
> > Cheers,
> >
> > Matt
>
>


-- 
M J Harvey
CTO
Acellera Ltd.
London, Barcelona.
www.acellera.com | m.j.har...@acellera.com
Follow us on: twitter <http://www.twitter.com/acellera>, linkedin
<http://www.linkedin.com/company/acellera>, youtube
<http://youtube.com/acelleracom>.
Skypename: acellera

Re: Advice sought re inference types

Reply via email to