Matt,

> On Sep 23, 2015, at 8:52 PM, Matthew Taylor <[email protected]> wrote:
> 
> Chetan, wouldn't it be better to send a null value instead?
> 

How would you represent the null value? Whatever SDR represents the null value 
would then be recognized as a pattern each time it was seen, which you don't 
want.

- Chetan

> 
> On Wed, Sep 23, 2015 at 3:20 PM, Chetan Surpur <[email protected]> wrote:
>> Jonathan,
>> 
>> On Sep 23, 2015, at 1:54 AM, Jonathan Mackenzie <[email protected]> wrote:
>> 
>> Can a model be fed an instance with missing input fields? Sometimes my data
>> has error readings (indicated by a count of 2046) from the sensor and this
>> is not something that. is anomalous (2046 cars passing a single sensor in 5
>> minutes is highly anomalous, nigh impossible to occur and should probably be
>> ignored). How should I handle this? Should I just drop the instance
>> entirely? Keeping in mind that for a particular time, an intersection can
>> have valid readings on some sensors and error readings on others. Error
>> readings are not very common, about 8 in a month.
>> 
>> 
>> I would just repeat the last value whenever you detect an error reading.
>> 
>> - Chetan
>> 
>> 
>> On 15 September 2015 at 11:20, Matthew Taylor <[email protected]> wrote:
>>> 
>>> Jonathan, my replies are below:
>>> 
>>> On Sun, Sep 13, 2015 at 8:21 PM, Jonathan Mackenzie <[email protected]>
>>> wrote:
>>>> Following up on our discussions in gitter, basically, I want to perform
>>>> automated incident detection (AID as it's called in the literature) on
>>>> arterial roads (freeway roads are a different matter and transferability
>>>> of
>>>> algorithms between freeways and arterial roads is _difficult_).
>>>> 
>>>> I have 3.5 TB of data from 2006-2013 on ~540 intersections ... can nupic
>>>> handle this much data?
>>> 
>>> Yes. NuPIC can handle as much data as you throw at it, because the
>>> data is not stored. It will take you quite awhile to process that much
>>> data, however. I would suggest you attempt to multiprocess.
>>> 
>>> Your data looks good to me, but at what interval do you get it? I
>>> would suggest that you take high-speed data and aggregate it to 10-15
>>> minute intervals. If you pass the data in at faster intervals, NuPIC
>>> may not recognize larger temporal patterns, like weekly or seasonal
>>> patterns. This might not work if you are trying to identify traffic
>>> incidents within 10 minutes.
>>> 
>>>> The system would be used to determine if an incident has occurred
>>>> between
>>>> two intersections based on an anomaly value threshold. My initial
>>>> thought
>>>> for using nupic was to create a model for each intersection where the
>>>> inputs
>>>> were each individual loop detector. But apparently this is not possible
>>>> since htmengine performs anomaly detection on a single field only. I
>>>> still
>>>> want to perform anomaly detection, so from here, to use htmengine it
>>>> looks
>>>> like I have 2 options:
>>>> 
>>>> * Encode the readings into a single value; would this work?
>>> 
>>> Interesting idea, but the problem is how to encode data from multiple
>>> sensors into one data point. I'm not sure how this would work.
>>> 
>>>> * Make a model for every single sensor. Would this be useful?
>>> 
>>> Yes, I'm sure this would be useful, but there is a scaling problem.
>>> How many individual sensors do you have? It will take one model per
>>> sensor. If you have thousands of sensors, it is going to be hard to
>>> scale that many NuPIC models.
>>> 
>>>> It seems
>>>> intuitive to think that incidents have an effect on the overall flow of
>>>> an
>>>> intersection. Would the models be related to each other?
>>> 
>>> The models would not be related because they are only paying attention
>>> to their own streams, but if you got high anomaly indications from
>>> several models in the same intersection at once, it would be a huge
>>> indicator that something just happened.
>>> 
>>>> Could the sensor
>>>> model anomaly outputs be fed into a model for their intersection?
>>> 
>>> This has been brought up before, but we've never tried it so we don't
>>> know what would happen.
>>> 
>>>> 
>>>> What's the best way of solving my problem?
>>> 
>>> Another idea is to focus on just a few intersections so you don't have
>>> to deal with the scaling problem. You could create multi-variate
>>> models (models that look at more than one field of data) for each
>>> intersection. But you would need to build these models manually using
>>> the OPF, so it would take more work than the HTM Engine. But you'd
>>> have much more flexibility and control over your program. You can see
>>> a decent OPF example of this (except the multi-variate part) with the
>>> Hot Gym tutorials:
>>> 
>>> -
>>> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym
>>> -
>>> https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym
>>> 
>>>> I've followed the htmengine tutorial, but got stuck at the part where I
>>>> plug
>>>> the readings into the models.
>>> 
>>> I would like to help you if you are stuck. Not sure what you mean, but
>>> if you can share your codebase, I (or someone else) can try to help.
>>> 
>>> Regards,
>>> ---------
>>> Matt Taylor
>>> OS Community Flag-Bearer
>>> Numenta
>>> 
>> 
>> 
>> 
>> --
>> Jonathan Mackenzie
>> BEng (Software) Hons
>> PhD Candidate, Flinders University
>> 
>> 
> 


Reply via email to