Hi, A newbie question: The swarm runs over a pre-selected dataset. I suppose the the resulting model params will be optimal for those selected data, hence raising the possibility of overfitting. The resulted model params could be ill fit for data that never seen before.
The “classical” ML approach is to compare different models using a new “cross-validation” data set. The model gives the smallest error will be chosen. Does Nupic has “error” outputs? To further extend the question - when underline data behavior changes, when/what to signal the need to re-run swarm. Swarming seems pretty computational expensive, is it practical to run swarm over high speed online streaming data? Regards, Tom
