Re: Anomaly Detection Files

Nicholas Mitri Fri, 24 Oct 2014 09:10:20 -0700

Hey Subutai,

I tried both RDSE and SE with w=21 and a resolution dependent on the 
granularity level of the grid. Both performed similarly. The resolution of the 
encoder was chosen so that overlap of encoder outputs slopes off gradually as 
you move away from the reference point (leading to the straight slope you see 
in the earlier plots).


Nick


> On Oct 24, 2014, at 6:55 PM, Subutai Ahmad <[email protected]> wrote:
> 
> Hi Nicholas,
> 
> Ah, I misunderstood the experiment. (I thought you were comparing the SDR for 
> x against the SDR for y.)  Your experiment is a bit more sophisticated. What 
> are your encoder parameters here?  
> 
> Thanks,
> 
> --Subutai
> 
> On Wed, Oct 22, 2014 at 12:17 PM, Nicholas Mitri <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Subutai,
> 
> Each feature is encoded separately then the feature vector is concatenated 
> from both and fed to the SP, so (3,2) and (2,3) end up being different 
> patterns with their own response to the SP processing (thus the non-symmetric 
> contours). 
> 
> best,
> Nick
> 
> 
>> On Oct 22, 2014, at 9:57 PM, Subutai Ahmad <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Nicholas,
>> 
>> These are really great graphs, and a very nice way to do the analysis.  
>> Hopefully this can help improve our overall understanding of the SP, and 
>> help improve the algorithm implementations as well.
>> 
>> In addition to experimenting with the settings, I realized that the way you 
>> train the SP also has a big impact (since it is a learning system). As a 
>> baseline, I suggest creating the graph with essentially a random SP, i.e. 
>> set the increment/decrements to 0. We should see "reasonable" curves for 
>> such an SP.  
>> 
>> One question: I don't understand why the 2D plots aren't symmetric. What is 
>> the exact meaning of the value of (2,3)? How do you compute the value of 
>> (2,3) vs (3,2)?
>> 
>> --Subutai
>> 
>> On Wed, Oct 22, 2014 at 10:27 AM, Nicholas Mitri <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Thanks Subutai, I’ll try the suggested settings. 
>> 
>> I prepared some supplementary figures for follow up on the discussion below. 
>> For a 2D feature set, I plotted the contours that represent the overlap with 
>> (0,0) as the reference point and grid points of the 2D space as test 
>> samples. Here are 3 runs under the same settings followed by the contours 
>> for euclidean and manhattan norms for reference. You’ll notice that the 
>> overlap creates contours that are random and inconsistent in shape. The 
>> values at the contours still decrease as you span out from the center which 
>> is good (as shown by the color gradient). Unfortunately, it’s very hard to 
>> make a case for spatial anomaly detection with non-uniform contours. 
>> 
>> Sorry about the rough contours, matplotlib is acting up for finer mesh grids.
>> 
>> <contour1.png>
>> <contour2.png>
>> <contour3.png><eucContour.png>
>> <manContour.png>
>> 
>> 
>>> On Oct 22, 2014, at 7:51 PM, Subutai Ahmad <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> Thanks. 
>>> 
>>> Yes, your reasoning about potentialPct = 1 seems right. I too think that 
>>> setting doesn't work in general. As you say, a small number of columns can 
>>> start to dominate and become fully connected.
>>> 
>>> I have had best luck with a number well above 0.5 though. With 0.5, only 
>>> about 25% will be initially connected. With w=21, that is only about 5 or 6 
>>> bits. 5 or 6 makes me uncomfortable - it is insufficient for reliable 
>>> performance and can cause random changes to have large effects. I would 
>>> suggest trying something like 0.8 or so.  
>>> 
>>> Also, in my experience I had better luck with a smaller learning rate.  I 
>>> would suggest trying synPermActiveInc of 0.001 and maybe synPermInactiveDec 
>>> about half of that.
>>> 
>>> --Subutai
>>> 
>>> 
>>> On Wed, Oct 22, 2014 at 9:01 AM, Nicholas Mitri <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> I was trying different configurations just now. The best results are 
>>> achieved with 
>>> 
>>> SP(self.inputShape,
>>>                      self.columnDimensions,
>>>                      potentialRadius=self.inputSize,
>>>                      potentialPct=0.5,
>>>                      numActiveColumnsPerInhArea=int(self.sparsity * 
>>> self.columnNumber),
>>>                      globalInhibition=True,
>>>                      synPermActiveInc=0.05)
>>> 
>>> potentialPct and synPermActiveInc have the most impact on the results. 
>>> Specifically, potentialPct set to 1 has a very negative effect on the SP’s 
>>> behavior as seen below. I suspect setting this parameter to 1 and thus 
>>> allowing all columns to “see” all inputs levels the field of competition 
>>> and causes the top 2% set to change drastically from one input to the next. 
>>> A lower setting on that parameter allows a dominant and more stable set of 
>>> columns to be established, which would explain why the overlap drops 
>>> gradually. 
>>> <fig6.png>
>>> 
>>>> On Oct 22, 2014, at 6:47 PM, Subutai Ahmad <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi Nichols,
>>>> 
>>>> I think these are great tests to do. Can you share the SP parameters you 
>>>> used for this? What was the potential pct, learning rates, and inhibition?
>>>> 
>>>> Thanks,
>>>> 
>>>> --Subutai
>>>> 
>>>> On Wed, Oct 22, 2014 at 6:45 AM, Nicholas Mitri <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Hey mark,
>>>> 
>>>> To follow up on our discussion yesterday, I did a few tests on a 
>>>> 1024-column SP with 128-bit long (w = 21) RDSE input. 
>>>> I fed the network inputs in the range [1-20] and calculated the overlap of 
>>>> the output of the encoder and the output of the SP with the corresponding 
>>>> outputs for input = 1. The plots below show 3 different runs under the 
>>>> same settings. 
>>>> 
>>>> The overlap at the encoder level is a straight line as expected since the 
>>>> RDSE resolution is set to 1. The green plot compares the overlap at the SP 
>>>> level. 
>>>> Looking at these plots, it appears my statement about the assumption of 
>>>> raw distance not translating to overlap is true. The good news is it seems 
>>>> to be a rarity for the condition to break! Specifically, notice that in 
>>>> the 3rd plot, input 13 has more overlap with 1 than 12, thus breaking the 
>>>> condition. Also, notice the effect of random initialization on the shape 
>>>> of the green plot which shows no consistent relation with the encoder 
>>>> overlap.
>>>> 
>>>> Taking all this consideration, since the assumption seems to hold for most 
>>>> cases and the SP overlap is non-increasing, I think we can leverage the 
>>>> overlap for spatial anomaly detection as discussed earlier but I see 
>>>> little promise of it performing well given the inconsistency of the 
>>>> overlap metric. 
>>>> 
>>>> <fig3.png>
>>>> <fig4.png>
>>>> <fig5.png>
>>>> 
>>>>> On Oct 21, 2014, at 6:13 PM, Marek Otahal <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hello Nick, 
>>>>> 
>>>>> 
>>>>> On Tue, Oct 21, 2014 at 3:51 PM, Nicholas Mitri <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> The relation extends past that, it’s ||input1,input2|| -> 
>>>>> ||encSDR1,encSDR2|| -> ||colSDR1,colSDR2||. 
>>>>> 
>>>>> The assumption holds perfectly for the first transition. For the second, 
>>>>> things get a bit muddy because ...
>>>>> 
>>>>> Actually, I meant the 2nd transition ||inputs|| -> ||colSDR|| as encoders 
>>>>> do not produce SDRs but just vector representation of the input. 
>>>>>  
>>>>> of the random partial connectivity of columns. If for 2 different 
>>>>> patterns the set of active columns only changes by 1 column (e.g. 
>>>>> {1,2,4,7} for pattern1 and {1,2,3,7} for pattern2), there’s no way for 
>>>>> you to know if that difference was caused by a change of 2 resolution 
>>>>> steps or 5 resolution steps at the level of the raw input. So, distances 
>>>>> in raw feature space don’t translate to the high dimensional binary space 
>>>>> of columnar activations. 
>>>>> 
>>>>> If the change 2vs5 resolution steps in raw input is significant 
>>>>> (according to encoder settings, SP avail resources (#cols)-vs-diversity 
>>>>> of patterns,...), you may not be able to tell from ||sdr1,sdr2|| exactly 
>>>>> what the ||.|| in raw inputs was, but the correlation (linear?) in 
>>>>> ||input,input|| < ||input,input2(diff 2 res. steps)|| < ||input, input5|| 
>>>>> --> 
>>>>> ||sdr,sdr|| < ||sdr,sdr2|| < ||sdr,sdr5||; thus you could find a 
>>>>> threshold for anomaly/normal. 
>>>>>  
>>>>> 
>>>>> If you’ve seen the thesis chapter I posted here some time ago about using 
>>>>> the SP for clustering, you can see what I describe here happening for 
>>>>> different runs of the clustering code. I’d get different clusters each 
>>>>> time with no predictable cluster shapes. I‘ll attach the clustering 
>>>>> results here for your reference. 
>>>>> 
>>>>> Not sure, I'll search for it and take a look! 
>>>>> 
>>>>> If you extend the conclusions of the clustering experiments to spatial 
>>>>> anomaly detection, I think it’s fair to assume that there is no way to 
>>>>> use columnar activation to compute a continuous anomaly score in the same 
>>>>> way you can for distance based anomaly detectors. 
>>>>> Would this imply "anomaly in nupic does not work"? Because if we assumed 
>>>>> it's impossible to get anomaly score from a lower layer - SP, can we do 
>>>>> that in TP which takes the former as input?
>>>>> 
>>>>> > I‘ll attach the clustering results here for your reference. 
>>>>> 
>>>>> I like the visualization! :) What is the problem with it? the clusters 
>>>>> are more or less same shape and same (spatial) distribution. If the 
>>>>> problem is there's essentially no overlap? So you cant say Red and Green 
>>>>> cluster are closer than Red and Blue, as 1~3 is closer than 1~5? I think 
>>>>> it's because the SP is not saturated enough (for it's size vs the small 
>>>>> input range). 
>>>>> 
>>>>> This might be hard to visualize, as you need enough cols for SP to work 
>>>>> well (those default ~2048), it has "too much" capacity, so the clusters 
>>>>> are totally distinct. Would be interesting if you visualize same SP 
>>>>> trained on 2000 numbers? (maybe more/less?)
>>>>> 
>>>>> regards, Mark
>>>>>  
>>>>> -- 
>>>>> Marek Otahal :o)
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: Anomaly Detection Files

Reply via email to