Hi Nicholas, Ah, I misunderstood the experiment. (I thought you were comparing the SDR for x against the SDR for y.) Your experiment is a bit more sophisticated. What are your encoder parameters here?
Thanks, --Subutai On Wed, Oct 22, 2014 at 12:17 PM, Nicholas Mitri <[email protected]> wrote: > Hi Subutai, > > Each feature is encoded separately then the feature vector is concatenated > from both and fed to the SP, so (3,2) and (2,3) end up being different > patterns with their own response to the SP processing (thus the > non-symmetric contours). > > best, > Nick > > > On Oct 22, 2014, at 9:57 PM, Subutai Ahmad <[email protected]> wrote: > > Hi Nicholas, > > These are really great graphs, and a very nice way to do the analysis. > Hopefully this can help improve our overall understanding of the SP, and > help improve the algorithm implementations as well. > > In addition to experimenting with the settings, I realized that the way > you train the SP also has a big impact (since it is a learning system). As > a baseline, I suggest creating the graph with essentially a random SP, i.e. > set the increment/decrements to 0. We should see "reasonable" curves for > such an SP. > > One question: I don't understand why the 2D plots aren't symmetric. What > is the exact meaning of the value of (2,3)? How do you compute the value of > (2,3) vs (3,2)? > > --Subutai > > On Wed, Oct 22, 2014 at 10:27 AM, Nicholas Mitri <[email protected]> > wrote: > >> Thanks Subutai, I’ll try the suggested settings. >> >> I prepared some supplementary figures for follow up on the discussion >> below. For a 2D feature set, I plotted the contours that represent the >> overlap with (0,0) as the reference point and grid points of the 2D space >> as test samples. Here are 3 runs under the same settings followed by the >> contours for euclidean and manhattan norms for reference. You’ll notice >> that the overlap creates contours that are random and inconsistent in >> shape. The values at the contours still decrease as you span out from the >> center which is good (as shown by the color gradient). Unfortunately, it’s >> very hard to make a case for spatial anomaly detection with non-uniform >> contours. >> >> Sorry about the rough contours, matplotlib is acting up for finer mesh >> grids. >> >> <contour1.png> >> <contour2.png> >> <contour3.png><eucContour.png> >> <manContour.png> >> >> >> On Oct 22, 2014, at 7:51 PM, Subutai Ahmad <[email protected]> wrote: >> >> >> Thanks. >> >> Yes, your reasoning about potentialPct = 1 seems right. I too think that >> setting doesn't work in general. As you say, a small number of columns can >> start to dominate and become fully connected. >> >> I have had best luck with a number well above 0.5 though. With 0.5, only >> about 25% will be initially connected. With w=21, that is only about 5 or 6 >> bits. 5 or 6 makes me uncomfortable - it is insufficient for reliable >> performance and can cause random changes to have large effects. I would >> suggest trying something like 0.8 or so. >> >> Also, in my experience I had better luck with a smaller learning rate. I >> would suggest trying synPermActiveInc of 0.001 and maybe synPermInactiveDec >> about half of that. >> >> --Subutai >> >> >> On Wed, Oct 22, 2014 at 9:01 AM, Nicholas Mitri <[email protected]> >> wrote: >> >>> I was trying different configurations just now. The best results are >>> achieved with >>> >>> SP(self.inputShape, >>> self.columnDimensions, >>> potentialRadius=self.inputSize, >>> potentialPct=0.5, >>> numActiveColumnsPerInhArea=int(self.sparsity * >>> self.columnNumber), >>> globalInhibition=True, >>> synPermActiveInc=0.05) >>> >>> potentialPct and synPermActiveInc have the most impact on the results. >>> Specifically, potentialPct set to 1 has a very negative effect on the SP’s >>> behavior as seen below. I suspect setting this parameter to 1 and thus >>> allowing all columns to “see” all inputs levels the field of competition >>> and causes the top 2% set to change drastically from one input to the next. >>> A lower setting on that parameter allows a dominant and more stable set of >>> columns to be established, which would explain why the overlap drops >>> gradually. >>> <fig6.png> >>> >>> On Oct 22, 2014, at 6:47 PM, Subutai Ahmad <[email protected]> wrote: >>> >>> Hi Nichols, >>> >>> I think these are great tests to do. Can you share the SP parameters you >>> used for this? What was the potential pct, learning rates, and inhibition? >>> >>> Thanks, >>> >>> --Subutai >>> >>> On Wed, Oct 22, 2014 at 6:45 AM, Nicholas Mitri <[email protected]> >>> wrote: >>> >>>> Hey mark, >>>> >>>> To follow up on our discussion yesterday, I did a few tests on a >>>> 1024-column SP with 128-bit long (w = 21) RDSE input. >>>> I fed the network inputs in the range [1-20] and calculated the overlap >>>> of the output of the encoder and the output of the SP with the >>>> corresponding outputs for input = 1. The plots below show 3 different runs >>>> under the same settings. >>>> >>>> The overlap at the encoder level is a straight line as expected since >>>> the RDSE resolution is set to 1. The green plot compares the overlap at the >>>> SP level. >>>> Looking at these plots, it appears my statement about the assumption of >>>> raw distance not translating to overlap is true. The good news is it seems >>>> to be a rarity for the condition to break! Specifically, notice that in the >>>> 3rd plot, input 13 has more overlap with 1 than 12, thus breaking the >>>> condition. Also, notice the effect of random initialization on the shape of >>>> the green plot which shows no consistent relation with the encoder overlap. >>>> >>>> Taking all this consideration, since the assumption seems to hold for >>>> most cases and the SP overlap is non-increasing, I think we can leverage >>>> the overlap for spatial anomaly detection as discussed earlier but I see >>>> little promise of it performing well given the inconsistency of the overlap >>>> metric. >>>> >>>> <fig3.png> >>>> <fig4.png> >>>> <fig5.png> >>>> >>>> On Oct 21, 2014, at 6:13 PM, Marek Otahal <[email protected]> wrote: >>>> >>>> Hello Nick, >>>> >>>> >>>> On Tue, Oct 21, 2014 at 3:51 PM, Nicholas Mitri <[email protected]> >>>> wrote: >>>>> >>>>> The relation extends past that, it’s ||input1,input2|| -> >>>>> ||encSDR1,encSDR2|| -> ||colSDR1,colSDR2||. >>>>> >>>>> The assumption holds perfectly for the first transition. For the >>>>> second, things get a bit muddy because ... >>>>> >>>> >>>> Actually, I meant the 2nd transition ||inputs|| -> ||colSDR|| as >>>> encoders do not produce SDRs but just vector representation of the input. >>>> >>>> >>>>> of the *random partial* connectivity of columns. If for 2 different >>>>> patterns the set of active columns only changes by 1 column (e.g. >>>>> {1,2,4,7} >>>>> for pattern1 and {1,2,3,7} for pattern2), there’s no way for you to know >>>>> if >>>>> that difference was caused by a change of 2 resolution steps or 5 >>>>> resolution steps at the level of the raw input. So, distances in raw >>>>> feature space don’t translate to the high dimensional binary space of >>>>> columnar activations. >>>>> >>>> >>>> If the change 2vs5 resolution steps in raw input is significant >>>> (according to encoder settings, SP avail resources (#cols)-vs-diversity of >>>> patterns,...), you may not be able to tell from ||sdr1,sdr2|| exactly what >>>> the ||.|| in raw inputs was, but the correlation (linear?) in >>>> ||input,input|| < ||input,input2(diff 2 res. steps)|| < ||input, input5|| >>>> --> >>>> ||sdr,sdr|| < ||sdr,sdr2|| < ||sdr,sdr5||; thus you could find a >>>> threshold for anomaly/normal. >>>> >>>> >>>>> >>>>> If you’ve seen the thesis chapter I posted here some time ago about >>>>> using the SP for clustering, you can see what I describe here happening >>>>> for >>>>> different runs of the clustering code. I’d get different clusters each >>>>> time >>>>> with no predictable cluster shapes. I‘ll attach the clustering results >>>>> here >>>>> for your reference. >>>>> >>>> >>>> Not sure, I'll search for it and take a look! >>>> >>>>> >>>>> If you extend the conclusions of the clustering experiments to spatial >>>>> anomaly detection, I think it’s fair to assume that there is no way to use >>>>> columnar activation to compute a continuous anomaly score in the same way >>>>> you can for distance based anomaly detectors. >>>>> >>>> Would this imply "anomaly in nupic does not work"? Because if we >>>> assumed it's impossible to get anomaly score from a lower layer - SP, can >>>> we do that in TP which takes the former as input? >>>> >>>> > I‘ll attach the clustering results here for your reference. >>>> >>>> I like the visualization! :) What is the problem with it? the clusters >>>> are more or less same shape and same (spatial) distribution. If the problem >>>> is there's essentially no overlap? So you cant say Red and Green cluster >>>> are closer than Red and Blue, as 1~3 is closer than 1~5? I think it's >>>> because the SP is not saturated enough (for it's size vs the small input >>>> range). >>>> >>>> This might be hard to visualize, as you need enough cols for SP to work >>>> well (those default ~2048), it has "too much" capacity, so the clusters are >>>> totally distinct. Would be interesting if you visualize same SP trained on >>>> 2000 numbers? (maybe more/less?) >>>> >>>> regards, Mark >>>> >>>> -- >>>> Marek Otahal :o) >>>> >>>> >>>> >>> >>> >> >> > >
