So I've started coding (here:
https://github.com/JonnoFTW/htm-models-adelaide) and I have a few questions
about OPF:


   1. Can a model be fed an instance with missing input fields? Sometimes
   my data has error readings (indicated by a count of 2046) from the sensor
   and this is not something that. is anomalous (2046 cars passing a single
   sensor in 5 minutes is highly anomalous, nigh impossible to occur and
   should probably be ignored). How should I handle this? Should I just drop
   the instance entirely? Keeping in mind that for a particular time, an
   intersection can have valid readings on some sensors and error readings on
   others. Error readings are not very common, about 8 in a month.
   2. In the hotgym example, is the swarm_description the same for anomaly
   as for prediction? This file isn't provided for the anomaly one, I based my
   attempt off the prediction one, am I headed in the right direction here
   
https://github.com/JonnoFTW/htm-models-adelaide/blob/master/create_swarm_config.py
   ? I feel such a script is necessary since I'll need to create multiple
   models and they will probably all have slightly different model parameters.
   3. Can I submit this for the HTM challenge even though my data is not
   open?


On 15 September 2015 at 11:20, Matthew Taylor <[email protected]> wrote:

> Jonathan, my replies are below:
>
> On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected]>
> wrote:
> > Following up on our discussions in gitter, basically, I want to perform
> > automated incident detection (AID as it's called in the literature) on
> > arterial roads (freeway roads are a different matter and transferability
> of
> > algorithms between freeways and arterial roads is _difficult_).
> >
> > I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic
> handle this much data?
>
> Yes. NuPIC can handle as much data as you throw at it, because the
> data is not stored. It will take you quite awhile to process that much
> data, however. I would suggest you attempt to multiprocess.
>
> Your data looks good to me, but at what interval do you get it? I
> would suggest that you take high-speed data and aggregate it to 10-15
> minute intervals. If you pass the data in at faster intervals, NuPIC
> may not recognize larger temporal patterns, like weekly or seasonal
> patterns. This might not work if you are trying to identify traffic
> incidents within 10 minutes.
>
> > The system would be used to determine if an incident has occurred between
> > two intersections based on an anomaly value threshold. My initial thought
> > for using nupic was to create a model for each intersection where the
> inputs
> > were each individual loop detector. But apparently this is not possible
> > since htmengine performs anomaly detection on a single field only. I
> still
> > want to perform anomaly detection, so from here, to use htmengine it
> looks
> > like I have 2 options:
> >
> >  * Encode the readings into a single value; would this work?
>
> Interesting idea, but the problem is how to encode data from multiple
> sensors into one data point. I'm not sure how this would work.
>
> >  * Make a model for every single sensor. Would this be useful?
>
> Yes, I'm sure this would be useful, but there is a scaling problem.
> How many individual sensors do you have? It will take one model per
> sensor. If you have thousands of sensors, it is going to be hard to
> scale that many NuPIC models.
>
> > It seems
> > intuitive to think that incidents have an effect on the overall flow of
> an
> > intersection. Would the models be related to each other?
>
> The models would not be related because they are only paying attention
> to their own streams, but if you got high anomaly indications from
> several models in the same intersection at once, it would be a huge
> indicator that something just happened.
>
> > Could the sensor
> > model anomaly outputs be fed into a model for their intersection?
>
> This has been brought up before, but we've never tried it so we don't
> know what would happen.
>
> >
> > What's the best way of solving my problem?
>
> Another idea is to focus on just a few intersections so you don't have
> to deal with the scaling problem. You could create multi-variate
> models (models that look at more than one field of data) for each
> intersection. But you would need to build these models manually using
> the OPF, so it would take more work than the HTM Engine. But you'd
> have much more flexibility and control over your program. You can see
> a decent OPF example of this (except the multi-variate part) with the
> Hot Gym tutorials:
>
> -
> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym
> -
> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym
>
> > I've followed the htmengine tutorial, but got stuck at the part where I
> plug
> > the readings into the models.
>
> I would like to help you if you are stuck. Not sure what you mean, but
> if you can share your codebase, I (or someone else) can try to help.
>
> Regards,
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
>


-- 
*Jonathan Mackenzie*
BEng (Software) Hons
PhD Candidate, Flinders University

Reply via email to