Jonathan,

> On Sep 23, 2015, at 1:54 AM, Jonathan Mackenzie <[email protected]> wrote:
> 
> Can a model be fed an instance with missing input fields? Sometimes my data 
> has error readings (indicated by a count of 2046) from the sensor and this is 
> not something that. is anomalous (2046 cars passing a single sensor in 5 
> minutes is highly anomalous, nigh impossible to occur and should probably be 
> ignored). How should I handle this? Should I just drop the instance entirely? 
> Keeping in mind that for a particular time, an intersection can have valid 
> readings on some sensors and error readings on others. Error readings are not 
> very common, about 8 in a month.

I would just repeat the last value whenever you detect an error reading.

- Chetan

> 
> On 15 September 2015 at 11:20, Matthew Taylor <[email protected] 
> <mailto:[email protected]>> wrote:
> Jonathan, my replies are below:
> 
> On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected] 
> <mailto:[email protected]>> wrote:
> > Following up on our discussions in gitter, basically, I want to perform
> > automated incident detection (AID as it's called in the literature) on
> > arterial roads (freeway roads are a different matter and transferability of
> > algorithms between freeways and arterial roads is _difficult_).
> >
> > I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic 
> > handle this much data?
> 
> Yes. NuPIC can handle as much data as you throw at it, because the
> data is not stored. It will take you quite awhile to process that much
> data, however. I would suggest you attempt to multiprocess.
> 
> Your data looks good to me, but at what interval do you get it? I
> would suggest that you take high-speed data and aggregate it to 10-15
> minute intervals. If you pass the data in at faster intervals, NuPIC
> may not recognize larger temporal patterns, like weekly or seasonal
> patterns. This might not work if you are trying to identify traffic
> incidents within 10 minutes.
> 
> > The system would be used to determine if an incident has occurred between
> > two intersections based on an anomaly value threshold. My initial thought
> > for using nupic was to create a model for each intersection where the inputs
> > were each individual loop detector. But apparently this is not possible
> > since htmengine performs anomaly detection on a single field only. I still
> > want to perform anomaly detection, so from here, to use htmengine it looks
> > like I have 2 options:
> >
> >  * Encode the readings into a single value; would this work?
> 
> Interesting idea, but the problem is how to encode data from multiple
> sensors into one data point. I'm not sure how this would work.
> 
> >  * Make a model for every single sensor. Would this be useful?
> 
> Yes, I'm sure this would be useful, but there is a scaling problem.
> How many individual sensors do you have? It will take one model per
> sensor. If you have thousands of sensors, it is going to be hard to
> scale that many NuPIC models.
> 
> > It seems
> > intuitive to think that incidents have an effect on the overall flow of an
> > intersection. Would the models be related to each other?
> 
> The models would not be related because they are only paying attention
> to their own streams, but if you got high anomaly indications from
> several models in the same intersection at once, it would be a huge
> indicator that something just happened.
> 
> > Could the sensor
> > model anomaly outputs be fed into a model for their intersection?
> 
> This has been brought up before, but we've never tried it so we don't
> know what would happen.
> 
> >
> > What's the best way of solving my problem?
> 
> Another idea is to focus on just a few intersections so you don't have
> to deal with the scaling problem. You could create multi-variate
> models (models that look at more than one field of data) for each
> intersection. But you would need to build these models manually using
> the OPF, so it would take more work than the HTM Engine. But you'd
> have much more flexibility and control over your program. You can see
> a decent OPF example of this (except the multi-variate part) with the
> Hot Gym tutorials:
> 
> - 
> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym
>  
> <https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym>
> - 
> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym
>  
> <https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym>
> 
> > I've followed the htmengine tutorial, but got stuck at the part where I plug
> > the readings into the models.
> 
> I would like to help you if you are stuck. Not sure what you mean, but
> if you can share your codebase, I (or someone else) can try to help.
> 
> Regards,
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
> 
> 
> 
> 
> -- 
> Jonathan Mackenzie 
> BEng (Software) Hons 
> PhD Candidate, Flinders University

Reply via email to