Tom, On feature of NuPIC and HTM is online learning, so as the data patterns change over time, NuPIC will recognize the new patterns while forgetting older patterns. So we usually don't have to re-run swarms. Swarming is not perfect, and sometimes we do some manual tuning to the model parameters it returns. But generally, when patterns within the data change over time, there is no need to re-swarm. Perhaps you might need to update the min/max values if the data starts jumping out of it's normal range.
You would have to re-swarm if you wanted to add a new field to the data input, or recategorized a field as a different data type or something like that. --------- Matt Taylor OS Community Flag-Bearer Numenta On Mon, Apr 27, 2015 at 6:51 PM, Tom Tan <[email protected]> wrote: > Hi, > > A newbie question: The swarm runs over a pre-selected dataset. I suppose > the the resulting model params will be optimal for those selected data, hence > raising the possibility of overfitting. The resulted model params could be > ill fit for data that never seen before. > > The “classical” ML approach is to compare different models using a new > “cross-validation” data set. The model gives the smallest error will be > chosen. Does Nupic has “error” outputs? > > To further extend the question - when underline data behavior changes, > when/what to signal the need to re-run swarm. Swarming seems pretty > computational expensive, is it practical to run swarm over high speed online > streaming data? > > Regards, > Tom > >
