Thanks very much, Fergal,

I think before using the anomaly detection facilities of NuPIC I would like
to just consider inference and see how well NuPIC can do at predicting the
bulk of the data where no anomalies are expected. I therefore intend to
use  'inferenceType': 'TemporalMultiStep' at first and I would like to get
predictions for several quantities. For instance, if I input both
temperature and tilt (consider just one time sequence each), I would like
temperature(t+1) AND tilt(t+1) to be output. To make things more concrete,
here is the result I get when running nupic for 1 timestep using
model.run(...). [here I am adapting the run.py file in the one_gym tutorial]

result =  ModelResult(  predictionNumber=106

        rawInput={'timestamp': datetime.datetime(2010, 7, 6, 10, 0),
'kw_energy_consumption': 44.2}

        sensorInput=SensorInput(        dataRow=(44.2,)

        dataDict={'timestamp': datetime.datetime(2010, 7, 6, 10, 0),
'kw_energy_consumption': 44.2}

        dataEncodings=[array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,

        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,

        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
dtype=float32)]

        sequenceReset=0.0

        category=0

)

        inferences={'multiStepPredictions': {5: {36.6: 0.32600618332119363,
40.7: 0.0194778517125032, 42.2: 0.058716832538386846, 39.6:
0.21727291982498376, 36.2: 0.017515548091618965, 44.099999999999994:
0.024999999999999991, 10.7: 0.081172572787322539, 5.420999999999999:
0.23697379692615872}}, 'multiStepBestPredictions': {5: 36.6}}

        metrics=None

        predictedFieldIdx=0

        predictedFieldName=kw_energy_consumption

)

Here I can see how the single stream of energy consumption data for hotgym
has been converted to a binary sequence containg mostly 0s. I understand
this is like a "slider control" so if a high number is input the 1s are
mostly to the right. So to enter temperature and tilt, the bit sequence
would be a concatenation of two such bitmaps, right? My question then is
how do I extract the temperature(t+1) and tilt(t+1) as separate numbers?
How can the classifier make sense of htis?

Where it says inferences above it lists the +5 predictions for only one
time series, namely the one specified as predictedFieldName. Can this
parameter become a list?

If I can only output one at a time that is ok too, as long as the
classifier can disambiguate the two streams...?

Another question. As an intermediate phase, does the classifier create a
bit sequence similar to the input bit sequence shown above that I could
look at? I.e. would it produce a slider control of 0s and 1s which can be
interpreted as a number?

Many thanks for your help,

John.


On Fri, Aug 15, 2014 at 11:15 AM, Fergal Byrne <[email protected]>
wrote:

> Hi John,
>
> No, NuPIC is great at looking at multiple fields of data and extracting
> both the per-field structure and the inter-field structure, but in practise
> it makes sense to proceed step-by-step from 18 single-metric models, up to
> pairs, and so on, and thus discover as you go how best to feed data in.
> With 18 metrics, the power set of combinations is very large, and most of
> these will be useless (or at best marginal), so you add fields one at a
> time to models which already lead the list of models, neglecting the ones
> which are failing to match your events.
>
> It's almost impossible statistically that the structure is evenly
> distributed across all your metrics, and much more likely that the most
> interesting inputs will be single fields, pairs or triplets of fields. If
> you have a strong intuition (or some evidence) that one pair of fields -
> such as temperature and tilt - is correlated, then these should be at the
> top of your list when you get up to pairs of metrics.
>
> Combining fields is simply a matter of concatenating the encodings for
> each field into a larger bit array (this happens internally in OPF). See
> the hotgym example for how to code it. Each column will sample from a
> subset of all bits, so NuPIC will identify correlative patterns
> automatically. The rate at which it does this will depend on the "density'
> of structure in the entire input. Giving the system a combination of all 18
> metrics will work, but will do so very slowly, because much of the input
> data (coming from irrelevant metrics) will not contribute to the
> recognition or the anomaly detection. On the other hand, treating each
> single metric as if it were the only input will help identify (to first
> order) which metrics contribute the most to solving the problem.
>
> My recommendation is to follow the procedure for identifying "unusual
> events" using the likelihood module to filter anomaly detection as outlined
> by Subutai and Matt. You're looking for good matches between known
> disturbances and the output of signals from the likelihood module (in
> Matt's talk he identifies the correlation between changes in the music and
> "red zones" in the likelihood plot). Go through this process for each
> single metric, and choose the top several metrics to "breed" your
> generation of paired metrics. If you get an improved correlation, add the
> best to your gene pool and iterate. Terminate when you stop improving the
> model, or when you get tired of seeking the last 1%!
>
> Regards,
>
> Fergal Byrne
>
>
>
> On Fri, Aug 15, 2014 at 10:42 AM, John Blackburn <
> [email protected]> wrote:
>
>> Dear Fergal and Ian,
>>
>> Thanks very much for your replies on this. Are you saying it is not
>> possible for NuPIC to take in multiple time series and predict multiple
>> time series? As I understand it, you are advising me to input only one of
>> the time series e.g. the first tilt sensor. However, in my system there is
>> a strong correlation between the temperature and the tilt so it would be
>> wrong for NuPIC to be unaware of the temperature data while predicting
>> tilt. Is it possible for NuPIC to account for spatial correlations between
>> data sets also?
>>
>> I could presumably give it all the data as a bitmap but then how would I
>> extract one of the data (eg tile 1) without getting mixed up with the other
>> data. It would be useful to have some more documentation on what the
>> decoder does and how to use it. Is any available?
>>
>> John.
>>
>>
>> On Thu, Aug 14, 2014 at 12:30 PM, Fergal Byrne <
>> [email protected]> wrote:
>>
>>> Hi John,
>>>
>>> I agree with Ian: the first thing to do is to create a separate model
>>> which learns the spatiotemporal characteristics of each input metric. This
>>> will give you a picture of how well each metric behaves as a measure of the
>>> anomalies in your bridge's lifecycle. Experience with Grok (which does only
>>> this model-per-metric regime) on numerous systems shows that this is often
>>> enough, in that a single high anomaly likelihood score among all the
>>> metrics is enough to identify an event worthy of attention, and a second or
>>> third blip on other metrics will confirm it.
>>>
>>> It's important to use the likelihood score first, as it will filter out
>>> many perfectly normal events which your system produces, and which might
>>> frequently cause high anomaly scores from the raw predictions. if you can
>>> confirm that you are getting good correlations between your known events
>>> and likelihood alarms on one or more metrics, this will allow you to
>>> identify which single metrics and combinations are best at identifying your
>>> disturbances.
>>>
>>> Once you've identified the clearly best metrics (A, B and C say), you
>>> could start adding the others (d, e, f, etc) one at a time, creating a set
>>> of metrics which might give you even better correlation (eg Ac, Ba might be
>>> better than A or B alone).
>>>
>>> As Ian says, this is how the swarming algorithm works, but in this case
>>> the space of combinations is too large for swarming to make any sense. Use
>>> a depth-first approach instead by using single-metric models to group your
>>> metrics in quality bands. (The other issue with swarming is that it uses
>>> anomaly scores rather than likelihood scores to rank candidate choices of
>>> input fields).
>>>
>>> Please keep us informed about how you get on.
>>>
>>> Regards,
>>>
>>> Fergal Byrne
>>>
>>>
>>> On Wed, Aug 13, 2014 at 6:05 PM, Ian Danforth <[email protected]>
>>> wrote:
>>>
>>>> Use separate models for each giving each model time and sensor values.
>>>>
>>>> Start with two sensors and run both through the swarming process and
>>>> let us know what difficulties you run into.
>>>>
>>>> Ian
>>>> On 13 Aug 2014 03:37, "John Blackburn" <[email protected]>
>>>> wrote:
>>>>
>>>>> Dear All,
>>>>>
>>>>> I am a researcher at the National Physical Laboratory, London and am
>>>>> attempting to use NuPIC to model the strain and temperature variations of 
>>>>> a
>>>>> concrete bridge for anomaly detection. The bridge has 10 temperatures
>>>>> sensors and 8 "tilt sensors" (basically strain) arranged across it. I have
>>>>> hourly readings for all of these sensors for a 3 year period. I would like
>>>>> NuPIC to predict all of these quantities (and keep them separate). 
>>>>> Compared
>>>>> to the "hotgym" example, the difference here is that there are 18 separate
>>>>> streams of data which would need to be suitably encoded and decoded to 
>>>>> make
>>>>> predictions of each one. I suspect the decoding stage would be most
>>>>> difficult: from the set of cell activations we need to discover 18 numbers
>>>>> and keep them separate. The HTM should account for cross correlations
>>>>> between time series as well as auto-correlations. I would like to consider
>>>>> +1 and +5 predictions, for example.
>>>>>
>>>>> During the course of the experiment, various interventions were
>>>>> carried out at known times. These include cutting support cables, removing
>>>>> chunks of concrete and adding heavy weights. The NN should show anomalous
>>>>> behaviour at the time these interventions were done. The system has been
>>>>> modelled using an Echo Sensor Network so I want to compare performance of
>>>>> ESN to HTM.
>>>>>
>>>>> So, is this task possible with NuPIC and how might I adjust the
>>>>> encoder, decoder to deal with multiple streams?
>>>>>
>>>>> Many thanks for your help,
>>>>>
>>>>> John Blackburn.
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT
>>>
>>> Author, Real Machine Intelligence with Clortex and NuPIC
>>> https://leanpub.com/realsmartmachines
>>>
>>> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
>>> http://euroclojure.com/2014/
>>> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
>>>
>>> http://inbits.com - Better Living through Thoughtful Technology
>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Join the quest for Machine Intelligence at http://numenta.org
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne, Brenter IT
>
> Author, Real Machine Intelligence with Clortex and NuPIC
> https://leanpub.com/realsmartmachines
>
> Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
> http://euroclojure.com/2014/
> and at LambdaJam Chicago, July 2014: http://www.lambdajam.com
>
> http://inbits.com - Better Living through Thoughtful Technology
> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>
> e:[email protected] t:+353 83 4214179
> Join the quest for Machine Intelligence at http://numenta.org
> Formerly of Adnet [email protected] http://www.adnet.ie
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to