Hey Mark 

Just to clarify, when I reference anomaly detection outside the scope of Nupic, 
it’s distance based anomaly/novelty detectors. This includes Euclidean, Euc 
Normed, Euc scaled, Manhattan, Mahalanobis, …). 

The purpose of these simple detectors is to gain insight into how much you can 
expect the feature vector of a single class to vary and then use that knowledge 
to decide if a new feature pattern demonstrates “normal” variance from the 
average or is too far and therefore anomalous. It’s all based in distance and 
is spatial anomaly detection as its simplest. 
The example you included is a multi-cluster scenario where you’d need to 
cluster, learn the proper centroid for each cluster and then calculate anomaly 
scores for each cluster separately. It’s an extension of the single cluster 
definition I gave. 

> anomaly detector = diff between active columns and columns with high weights 
> (commonly used)? 

It would be the difference between active columns and a set of columns produced 
by the training passes to act as a template (analog to the centroid of a 
cluster as mentioned above). Here, we’re assuming that there’s a correlation 
between the distance between 2 raw patterns and their corresponding columnar 
activations. I’m not sure if this assumption holds for nupic’s SP 
implementation though. 

best,
Nick

> On Oct 21, 2014, at 2:58 PM, Marek Otahal <[email protected]> wrote:
> 
> Hi Nick, 
> 
> thanks for explanations.. some comments below.
> 
> On Tue, Oct 21, 2014 at 1:13 PM, Nicholas Mitri <[email protected] 
> <mailto:[email protected]>> wrote:
>  This is not traditional spatial anomaly detection where the purpose is to 
> decide if a new input pattern falls within the RANGE of previously observed 
> patterns.
> 
> Hmm, I was unaware of such spatial anomaly definition, so if I understand it 
> right: 
> experienced: {1,2,3,101,102,103}, value 51 is normal, while 152 is anomalous? 
> (1000 being anomaly is ok). 
> 
> I somehow don't like this definition (not sure why exactly now :)), maybe a 
> "distance from significant clusters in observed data" would be better 
> (?)(152, 51 have same anomaly score, and eg 10 has a low score). 
> ...but if it has its uses, why not. 
> 
>  
> Here’s a few excerpts from the wiki:
> 
> "A non-temporal anomaly is defined as a combination of fields that doesn’t 
> usually occur, independent of the history of the data.”
> Maybe we should update the wiki here, imho everything in CLA is dependent of 
> history of the data (but not on the sequential order of the data, in this 
> case) 
>  
> 
> This formulation will produce high anomaly scores for patterns that haven’t 
> been seen before even if they fall inside the cluster of older patterns. 
> Essentially, it’s detecting rarity and not spatial distance.
> True, that is how CLA anomaly works now, maybe you could generate your 
> training samples from uniform distribution (instead of just the edge cases)?
> 
>  
> Scott’s suggestion of using overlap instead is spatial anomaly detection in 
> the traditional sense.
> I haven’t started testing out any code but I’d be interested in seeing if the 
> SP can be used like a distance based anomaly detector. Specifically, I want 
> to find out whether the spatial pattern stability can be used as an analog 
> for a cluster centroid and thus compared to novel input to calculate anomaly. 
> 
> I see. I think this is what you both said, so  distance based anomaly 
> detector = diff between active columns and columns with high weights 
> (commonly used)? 
> We could turn this around and output distance anomaly as ratio of active 
> columns with low weights. 
> 
> My main concern with that approach is that the anomaly detector will produce 
> a centroid and a threshold that is used to calculate an anomaly score (think 
> of sigmoid function with the threshold as the knee). In the SP, the only way 
> to achieve that is to force stability for all training patterns and bake in 
> the thresholds accordingly to use for testing patterns.
> 
> Cheers, Mark
> 
> -- 
> Marek Otahal :o)

Reply via email to