Hi Nick, If the DUT is processing frame-by-frame we cannot expect reliable comparison to other detectors via NAB scoring, although your modification may help. One issue with the mod, however, is if you shift anomaly scores of 1 outside of the window, these become FPs. For example, if the full frame (125 data points) is initially within a window, it's possible your mod changes the score from 1 TP to 1 TP + 62 FPs.
Addressing your listed concerns: 1. This brings up an important point I should have mentioned earlier; my apologies. The score normalization method [1] assumes there are 44 TPs in the dataset, and also that the Baseline detector has run. 2. It is okay for the metrics' counts to vary for different application profiles. For a given DUT, the optimization step calculates the best threshold -- i.e. likelihood value above which a data point is anomalous -- for each application profile, where the best threshold is that which maximizes the score. Thus, consider the application profile "Rewards Low FP Rate". The optimal threshold for this profile will likely be higher than that of the other profiles because then the DUT outputs fewer detections, which likely results in fewer FPs. 3. The issue with your mod I mentioned above, and to a lesser extent the normalization method from (1), may explain the results here. What confusion matrix are you calculating? Is this post-processing you're doing on the results? If so, I'm sure myself and others would be interested in seeing it. [1] https://github.com/numenta/NAB/blob/master/nab/runner.py#L218 Alexander Lavin Software Engineer Numenta
