Hi Nick, Comparing raw scores vs the normalized scores is really a matter of preference; the normalized scores, with a set range of 0-100, may be easier to comprehend.
Precision, recall, and their harmonic mean, F1 score, are helpful evaluation functions for most machine learning algorithms. They don't, however, incorporate time in the calculations, and are thus unsuitable for evaluating the ability of an algorithm to perform on real-time, streaming data. A main motivation for NAB was to design a scoring system which incorporates time and the TP,TN,FN, and FP counts. I wouldn't call this problematic, but rather desirable that NAB scoring doesn't mimic other evaluation functions. Cheers, Alex Alexander Lavin Software Engineer Numenta
