Jonathan, my replies are below: On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected]> wrote: > Following up on our discussions in gitter, basically, I want to perform > automated incident detection (AID as it's called in the literature) on > arterial roads (freeway roads are a different matter and transferability of > algorithms between freeways and arterial roads is _difficult_). > > I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic > handle this much data?
Yes. NuPIC can handle as much data as you throw at it, because the data is not stored. It will take you quite awhile to process that much data, however. I would suggest you attempt to multiprocess. Your data looks good to me, but at what interval do you get it? I would suggest that you take high-speed data and aggregate it to 10-15 minute intervals. If you pass the data in at faster intervals, NuPIC may not recognize larger temporal patterns, like weekly or seasonal patterns. This might not work if you are trying to identify traffic incidents within 10 minutes. > The system would be used to determine if an incident has occurred between > two intersections based on an anomaly value threshold. My initial thought > for using nupic was to create a model for each intersection where the inputs > were each individual loop detector. But apparently this is not possible > since htmengine performs anomaly detection on a single field only. I still > want to perform anomaly detection, so from here, to use htmengine it looks > like I have 2 options: > > * Encode the readings into a single value; would this work? Interesting idea, but the problem is how to encode data from multiple sensors into one data point. I'm not sure how this would work. > * Make a model for every single sensor. Would this be useful? Yes, I'm sure this would be useful, but there is a scaling problem. How many individual sensors do you have? It will take one model per sensor. If you have thousands of sensors, it is going to be hard to scale that many NuPIC models. > It seems > intuitive to think that incidents have an effect on the overall flow of an > intersection. Would the models be related to each other? The models would not be related because they are only paying attention to their own streams, but if you got high anomaly indications from several models in the same intersection at once, it would be a huge indicator that something just happened. > Could the sensor > model anomaly outputs be fed into a model for their intersection? This has been brought up before, but we've never tried it so we don't know what would happen. > > What's the best way of solving my problem? Another idea is to focus on just a few intersections so you don't have to deal with the scaling problem. You could create multi-variate models (models that look at more than one field of data) for each intersection. But you would need to build these models manually using the OPF, so it would take more work than the HTM Engine. But you'd have much more flexibility and control over your program. You can see a decent OPF example of this (except the multi-variate part) with the Hot Gym tutorials: - https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym - https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym > I've followed the htmengine tutorial, but got stuck at the part where I plug > the readings into the models. I would like to help you if you are stuck. Not sure what you mean, but if you can share your codebase, I (or someone else) can try to help. Regards, --------- Matt Taylor OS Community Flag-Bearer Numenta
