Jonathan, > On Sep 23, 2015, at 1:54 AM, Jonathan Mackenzie <[email protected]> wrote: > > Can a model be fed an instance with missing input fields? Sometimes my data > has error readings (indicated by a count of 2046) from the sensor and this is > not something that. is anomalous (2046 cars passing a single sensor in 5 > minutes is highly anomalous, nigh impossible to occur and should probably be > ignored). How should I handle this? Should I just drop the instance entirely? > Keeping in mind that for a particular time, an intersection can have valid > readings on some sensors and error readings on others. Error readings are not > very common, about 8 in a month.
I would just repeat the last value whenever you detect an error reading. - Chetan > > On 15 September 2015 at 11:20, Matthew Taylor <[email protected] > <mailto:[email protected]>> wrote: > Jonathan, my replies are below: > > On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected] > <mailto:[email protected]>> wrote: > > Following up on our discussions in gitter, basically, I want to perform > > automated incident detection (AID as it's called in the literature) on > > arterial roads (freeway roads are a different matter and transferability of > > algorithms between freeways and arterial roads is _difficult_). > > > > I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic > > handle this much data? > > Yes. NuPIC can handle as much data as you throw at it, because the > data is not stored. It will take you quite awhile to process that much > data, however. I would suggest you attempt to multiprocess. > > Your data looks good to me, but at what interval do you get it? I > would suggest that you take high-speed data and aggregate it to 10-15 > minute intervals. If you pass the data in at faster intervals, NuPIC > may not recognize larger temporal patterns, like weekly or seasonal > patterns. This might not work if you are trying to identify traffic > incidents within 10 minutes. > > > The system would be used to determine if an incident has occurred between > > two intersections based on an anomaly value threshold. My initial thought > > for using nupic was to create a model for each intersection where the inputs > > were each individual loop detector. But apparently this is not possible > > since htmengine performs anomaly detection on a single field only. I still > > want to perform anomaly detection, so from here, to use htmengine it looks > > like I have 2 options: > > > > * Encode the readings into a single value; would this work? > > Interesting idea, but the problem is how to encode data from multiple > sensors into one data point. I'm not sure how this would work. > > > * Make a model for every single sensor. Would this be useful? > > Yes, I'm sure this would be useful, but there is a scaling problem. > How many individual sensors do you have? It will take one model per > sensor. If you have thousands of sensors, it is going to be hard to > scale that many NuPIC models. > > > It seems > > intuitive to think that incidents have an effect on the overall flow of an > > intersection. Would the models be related to each other? > > The models would not be related because they are only paying attention > to their own streams, but if you got high anomaly indications from > several models in the same intersection at once, it would be a huge > indicator that something just happened. > > > Could the sensor > > model anomaly outputs be fed into a model for their intersection? > > This has been brought up before, but we've never tried it so we don't > know what would happen. > > > > > What's the best way of solving my problem? > > Another idea is to focus on just a few intersections so you don't have > to deal with the scaling problem. You could create multi-variate > models (models that look at more than one field of data) for each > intersection. But you would need to build these models manually using > the OPF, so it would take more work than the HTM Engine. But you'd > have much more flexibility and control over your program. You can see > a decent OPF example of this (except the multi-variate part) with the > Hot Gym tutorials: > > - > https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym > > <https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym> > - > https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym > > <https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym> > > > I've followed the htmengine tutorial, but got stuck at the part where I plug > > the readings into the models. > > I would like to help you if you are stuck. Not sure what you mean, but > if you can share your codebase, I (or someone else) can try to help. > > Regards, > --------- > Matt Taylor > OS Community Flag-Bearer > Numenta > > > > > -- > Jonathan Mackenzie > BEng (Software) Hons > PhD Candidate, Flinders University
