Matt, > On Sep 23, 2015, at 8:52 PM, Matthew Taylor <[email protected]> wrote: > > Chetan, wouldn't it be better to send a null value instead? >
How would you represent the null value? Whatever SDR represents the null value would then be recognized as a pattern each time it was seen, which you don't want. - Chetan > > On Wed, Sep 23, 2015 at 3:20 PM, Chetan Surpur <[email protected]> wrote: >> Jonathan, >> >> On Sep 23, 2015, at 1:54 AM, Jonathan Mackenzie <[email protected]> wrote: >> >> Can a model be fed an instance with missing input fields? Sometimes my data >> has error readings (indicated by a count of 2046) from the sensor and this >> is not something that. is anomalous (2046 cars passing a single sensor in 5 >> minutes is highly anomalous, nigh impossible to occur and should probably be >> ignored). How should I handle this? Should I just drop the instance >> entirely? Keeping in mind that for a particular time, an intersection can >> have valid readings on some sensors and error readings on others. Error >> readings are not very common, about 8 in a month. >> >> >> I would just repeat the last value whenever you detect an error reading. >> >> - Chetan >> >> >> On 15 September 2015 at 11:20, Matthew Taylor <[email protected]> wrote: >>> >>> Jonathan, my replies are below: >>> >>> On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected]> >>> wrote: >>>> Following up on our discussions in gitter, basically, I want to perform >>>> automated incident detection (AID as it's called in the literature) on >>>> arterial roads (freeway roads are a different matter and transferability >>>> of >>>> algorithms between freeways and arterial roads is _difficult_). >>>> >>>> I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic >>>> handle this much data? >>> >>> Yes. NuPIC can handle as much data as you throw at it, because the >>> data is not stored. It will take you quite awhile to process that much >>> data, however. I would suggest you attempt to multiprocess. >>> >>> Your data looks good to me, but at what interval do you get it? I >>> would suggest that you take high-speed data and aggregate it to 10-15 >>> minute intervals. If you pass the data in at faster intervals, NuPIC >>> may not recognize larger temporal patterns, like weekly or seasonal >>> patterns. This might not work if you are trying to identify traffic >>> incidents within 10 minutes. >>> >>>> The system would be used to determine if an incident has occurred >>>> between >>>> two intersections based on an anomaly value threshold. My initial >>>> thought >>>> for using nupic was to create a model for each intersection where the >>>> inputs >>>> were each individual loop detector. But apparently this is not possible >>>> since htmengine performs anomaly detection on a single field only. I >>>> still >>>> want to perform anomaly detection, so from here, to use htmengine it >>>> looks >>>> like I have 2 options: >>>> >>>> * Encode the readings into a single value; would this work? >>> >>> Interesting idea, but the problem is how to encode data from multiple >>> sensors into one data point. I'm not sure how this would work. >>> >>>> * Make a model for every single sensor. Would this be useful? >>> >>> Yes, I'm sure this would be useful, but there is a scaling problem. >>> How many individual sensors do you have? It will take one model per >>> sensor. If you have thousands of sensors, it is going to be hard to >>> scale that many NuPIC models. >>> >>>> It seems >>>> intuitive to think that incidents have an effect on the overall flow of >>>> an >>>> intersection. Would the models be related to each other? >>> >>> The models would not be related because they are only paying attention >>> to their own streams, but if you got high anomaly indications from >>> several models in the same intersection at once, it would be a huge >>> indicator that something just happened. >>> >>>> Could the sensor >>>> model anomaly outputs be fed into a model for their intersection? >>> >>> This has been brought up before, but we've never tried it so we don't >>> know what would happen. >>> >>>> >>>> What's the best way of solving my problem? >>> >>> Another idea is to focus on just a few intersections so you don't have >>> to deal with the scaling problem. You could create multi-variate >>> models (models that look at more than one field of data) for each >>> intersection. But you would need to build these models manually using >>> the OPF, so it would take more work than the HTM Engine. But you'd >>> have much more flexibility and control over your program. You can see >>> a decent OPF example of this (except the multi-variate part) with the >>> Hot Gym tutorials: >>> >>> - >>> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym >>> - >>> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym >>> >>>> I've followed the htmengine tutorial, but got stuck at the part where I >>>> plug >>>> the readings into the models. >>> >>> I would like to help you if you are stuck. Not sure what you mean, but >>> if you can share your codebase, I (or someone else) can try to help. >>> >>> Regards, >>> --------- >>> Matt Taylor >>> OS Community Flag-Bearer >>> Numenta >>> >> >> >> >> -- >> Jonathan Mackenzie >> BEng (Software) Hons >> PhD Candidate, Flinders University >> >> >
