Re: Why swarming and running model in two steps?

Wakan Tanka Thu, 21 May 2015 05:26:17 -0700

Hello Marek,

On 05/21/2015 03:50 AM, Marek Otahal wrote:

Hi,


I'll try to follow on the swarming vs. running question:
-"run model" = model + parameters + input data ->produce outputs
(predictions, ...)
-"swarm" = model + some data (+lot of time) -> find "optimal" parameters
(for model+data)

So, below..


This is my understanding:

Pushing data to NuPIC algorithm requires model. The most easy way ofcreating this model is to using swarming process. Swarming basicallytries various models combinations over your data and if particular modelis not good then it is drop. So after swarming you should have the bestmodel for your data. This whole process is achieved with following command:

$NUPIC/scripts/run_swarm.py $PWD/search_def.json --maxWorkers=6

Q:

I do not know what should I imagine under model, maybe it is someparameters etc. so this is the another question.If swarming is most easy process, what other methods can be used tocreate model?What criteria should model met to be classified as good or bad duringswarming?

Now, when you have best model for your data, you need to pass this model+ your input data to NuPIC to get something useful (I know aboutprediction, which is basically configured with "predictionSteps" whatelse usefull info can be gain from your data is another question). Thisprocess is achieved with following command:

$NUPIC/scripts/run_opf_experiment.py $PWD/model_0/

After this you will gen inference file where the prediction will be stored.

The reason for two separate processes for creating best model andrunning the best model that comes to my mind right now is the fact thatrunning model on data can be also CPU intensive (I do not know thistrue, just guessing, can someone make statement?). So you don't have torun two CPU intensive processes just one (running model) under conditionyou did not touch your original data.I would like to know what would happen If I create model with swarmingfor some data and then change those data and run this model? Is thiscomplete nonsense or is this used somewhere?I would also ask If I can specify to nupic to stop to learn aftercertain amount of data? AFAIK nupic is still learning so it might happenthat it will learn also anomalies and consider them as normal?E.g. Imagine the sine prediction example. First the NuPIC did not knowabout the data and simply repeat what it sees and it's anomaly score ishigh. Then when certain amount of periods passes NuPIC will learn thispattern, lower the anomaly score. Anything that is close to this patternwill have low anomaly score. In other words it will be able to predictwhat will happen in given time and answer questions like will be thefunction in next step increasing or decreasing or not changing etc. Theproblem is it will be constantly learning, how can I achieve to stoplearning e.g. after certain amount of data and give just an anomaly score?



    1. Is there any benefit for running swarming and running model in
    two steps?

yes,  swarm=parameters, run=outputs; although swarming imho saves also
the output of the best model, so you can already use that.

Yes, when you run swarm you get the best model, the only benefit ofreusing "old" swarm is the CPU load described above. What benefit givesme the swarm parameters (model) when I do not use it on data?

    2. Is running only swarming without running model useful for something?

yes, parameters.

Same question as above, what are parameters itself good for?


    3. Is there any benefit for three separate steps for running model
    in python?

which steps?

Maybe steps is not the right word but rather function calling. Incommand line there are two, one for create model and another for runningthis model:


$NUPIC/scripts/run_swarm.py $PWD/search_def.json --maxWorkers=6
$NUPIC/scripts/run_opf_experiment.py $PWD/model_0/

in python there is one for creating model (I guess, if I am wrongcorrect me please):

model_params = swarm_over_data()

and those for running model (again, if I am wrong correct me)

model = ModelFactory.create(model_params)model.enableInference({"predictedField": "sine"})

result = model.run({"sine": sine_value})

    4. When I have swarm data created in the past and I did not touch
    the input data how can I reuse it and run model in python?

call the model with the params you got from swarming; If you really did
not change the data, you could have saved the model (serialize) and then
just restore it back again.

Can you post python code of how to do that. If I understand It correct Ineed to somehow convince following function: ModelFactory.create() totake output from already existed files (model_0 dir) and not to pass itthe result of swarm_over_data() as in previous example because it wouldswarm again.

Note on swarming: of course it helps, but in Nupic the general defaults
are just "good enough" usually and the model adjusts to the data itself,
so usually just running the model is enough.

But you must have some model (created by swarming) to being able to pushyour data to NuPIC. You cannot skip swarming process (unless you createmodel manually) or do you?





Regards


Wakan

Re: Why swarming and running model in two steps?

Reply via email to