Hi Nick,
If the DUT is processing frame-by-frame we cannot expect reliable comparison to 
other detectors via NAB scoring, although your modification may help. One issue 
with the mod, however, is if you shift anomaly scores of 1 outside of the 
window, these become FPs. For example, if the full frame (125 data points) is 
initially within a window, it's possible your mod changes the score from 1 TP 
to 1 TP + 62 FPs.

Addressing your listed concerns:
1. This brings up an important point I should have mentioned earlier; my 
apologies. The score normalization method [1] assumes there are 44 TPs in the 
dataset, and also that the Baseline detector has run.
2. It is okay for the metrics' counts to vary for different application 
profiles. For a given DUT, the optimization step calculates the best threshold 
-- i.e. likelihood value above which a data point is anomalous -- for each 
application profile, where the best threshold is that which maximizes the 
score. Thus, consider the application profile "Rewards Low FP Rate". The 
optimal threshold for this profile will likely be higher than that of the other 
profiles because then the DUT outputs fewer detections, which likely results in 
fewer FPs.
3. The issue with your mod I mentioned above, and to a lesser extent the 
normalization method from (1), may explain the results here. What confusion 
matrix are you calculating? Is this post-processing you're doing on the 
results? If so, I'm sure myself and others would be interested in seeing it.

[1] https://github.com/numenta/NAB/blob/master/nab/runner.py#L218

Alexander Lavin
Software Engineer
Numenta

Reply via email to