Chetan, wouldn't it be better to send a null value instead? Jonathan, for #2: There is no swarm being run in the anomaly example. I just used the same model params after changing the inference type to TemporalAnomaly. Everything else is the same. Regarding creating many swarm descriptions programmatically... that makes sense sometimes, but I don't think it does in this context. The data coming out of each traffic sensor is probably very similar to all the other sensors. I would imagine that the same model params would be equally applicable to each sensor when you create a model. As the models run and learn their individual sensor patterns over time, that is what will make them different. Take the NYC traffic example, I used the same model params for every route, because the data format is the same for every route. You could probably do the same.
#3: Yes, but I would like to see a detailed description of your data or a sample data set. --------- Matt Taylor OS Community Flag-Bearer Numenta On Wed, Sep 23, 2015 at 3:20 PM, Chetan Surpur <[email protected]> wrote: > Jonathan, > > On Sep 23, 2015, at 1:54 AM, Jonathan Mackenzie <[email protected]> wrote: > > Can a model be fed an instance with missing input fields? Sometimes my data > has error readings (indicated by a count of 2046) from the sensor and this > is not something that. is anomalous (2046 cars passing a single sensor in 5 > minutes is highly anomalous, nigh impossible to occur and should probably be > ignored). How should I handle this? Should I just drop the instance > entirely? Keeping in mind that for a particular time, an intersection can > have valid readings on some sensors and error readings on others. Error > readings are not very common, about 8 in a month. > > > I would just repeat the last value whenever you detect an error reading. > > - Chetan > > > On 15 September 2015 at 11:20, Matthew Taylor <[email protected]> wrote: >> >> Jonathan, my replies are below: >> >> On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected]> >> wrote: >> > Following up on our discussions in gitter, basically, I want to perform >> > automated incident detection (AID as it's called in the literature) on >> > arterial roads (freeway roads are a different matter and transferability >> > of >> > algorithms between freeways and arterial roads is _difficult_). >> > >> > I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic >> > handle this much data? >> >> Yes. NuPIC can handle as much data as you throw at it, because the >> data is not stored. It will take you quite awhile to process that much >> data, however. I would suggest you attempt to multiprocess. >> >> Your data looks good to me, but at what interval do you get it? I >> would suggest that you take high-speed data and aggregate it to 10-15 >> minute intervals. If you pass the data in at faster intervals, NuPIC >> may not recognize larger temporal patterns, like weekly or seasonal >> patterns. This might not work if you are trying to identify traffic >> incidents within 10 minutes. >> >> > The system would be used to determine if an incident has occurred >> > between >> > two intersections based on an anomaly value threshold. My initial >> > thought >> > for using nupic was to create a model for each intersection where the >> > inputs >> > were each individual loop detector. But apparently this is not possible >> > since htmengine performs anomaly detection on a single field only. I >> > still >> > want to perform anomaly detection, so from here, to use htmengine it >> > looks >> > like I have 2 options: >> > >> > * Encode the readings into a single value; would this work? >> >> Interesting idea, but the problem is how to encode data from multiple >> sensors into one data point. I'm not sure how this would work. >> >> > * Make a model for every single sensor. Would this be useful? >> >> Yes, I'm sure this would be useful, but there is a scaling problem. >> How many individual sensors do you have? It will take one model per >> sensor. If you have thousands of sensors, it is going to be hard to >> scale that many NuPIC models. >> >> > It seems >> > intuitive to think that incidents have an effect on the overall flow of >> > an >> > intersection. Would the models be related to each other? >> >> The models would not be related because they are only paying attention >> to their own streams, but if you got high anomaly indications from >> several models in the same intersection at once, it would be a huge >> indicator that something just happened. >> >> > Could the sensor >> > model anomaly outputs be fed into a model for their intersection? >> >> This has been brought up before, but we've never tried it so we don't >> know what would happen. >> >> > >> > What's the best way of solving my problem? >> >> Another idea is to focus on just a few intersections so you don't have >> to deal with the scaling problem. You could create multi-variate >> models (models that look at more than one field of data) for each >> intersection. But you would need to build these models manually using >> the OPF, so it would take more work than the HTM Engine. But you'd >> have much more flexibility and control over your program. You can see >> a decent OPF example of this (except the multi-variate part) with the >> Hot Gym tutorials: >> >> - >> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym >> - >> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym >> >> > I've followed the htmengine tutorial, but got stuck at the part where I >> > plug >> > the readings into the models. >> >> I would like to help you if you are stuck. Not sure what you mean, but >> if you can share your codebase, I (or someone else) can try to help. >> >> Regards, >> --------- >> Matt Taylor >> OS Community Flag-Bearer >> Numenta >> > > > > -- > Jonathan Mackenzie > BEng (Software) Hons > PhD Candidate, Flinders University > >
