Hello Marek,

On 05/21/2015 03:50 AM, Marek Otahal wrote:
Hi,

I'll try to follow on the swarming vs. running question:
-"run model" = model + parameters + input data ->produce outputs
(predictions, ...)
-"swarm" = model + some data (+lot of time) -> find "optimal" parameters
(for model+data)

So, below..

This is my understanding:

Pushing data to NuPIC algorithm requires model. The most easy way of creating this model is to using swarming process. Swarming basically tries various models combinations over your data and if particular model is not good then it is drop. So after swarming you should have the best model for your data. This whole process is achieved with following command:
$NUPIC/scripts/run_swarm.py $PWD/search_def.json --maxWorkers=6

Q:
I do not know what should I imagine under model, maybe it is some parameters etc. so this is the another question. If swarming is most easy process, what other methods can be used to create model? What criteria should model met to be classified as good or bad during swarming?


Now, when you have best model for your data, you need to pass this model + your input data to NuPIC to get something useful (I know about prediction, which is basically configured with "predictionSteps" what else usefull info can be gain from your data is another question). This process is achieved with following command:
$NUPIC/scripts/run_opf_experiment.py $PWD/model_0/

After this you will gen inference file where the prediction will be stored.

The reason for two separate processes for creating best model and running the best model that comes to my mind right now is the fact that running model on data can be also CPU intensive (I do not know this true, just guessing, can someone make statement?). So you don't have to run two CPU intensive processes just one (running model) under condition you did not touch your original data. I would like to know what would happen If I create model with swarming for some data and then change those data and run this model? Is this complete nonsense or is this used somewhere? I would also ask If I can specify to nupic to stop to learn after certain amount of data? AFAIK nupic is still learning so it might happen that it will learn also anomalies and consider them as normal? E.g. Imagine the sine prediction example. First the NuPIC did not know about the data and simply repeat what it sees and it's anomaly score is high. Then when certain amount of periods passes NuPIC will learn this pattern, lower the anomaly score. Anything that is close to this pattern will have low anomaly score. In other words it will be able to predict what will happen in given time and answer questions like will be the function in next step increasing or decreasing or not changing etc. The problem is it will be constantly learning, how can I achieve to stop learning e.g. after certain amount of data and give just an anomaly score?






    1. Is there any benefit for running swarming and running model in
    two steps?

yes,  swarm=parameters, run=outputs; although swarming imho saves also
the output of the best model, so you can already use that.

Yes, when you run swarm you get the best model, the only benefit of reusing "old" swarm is the CPU load described above. What benefit gives me the swarm parameters (model) when I do not use it on data?


    2. Is running only swarming without running model useful for something?

yes, parameters.
Same question as above, what are parameters itself good for?



    3. Is there any benefit for three separate steps for running model
    in python?

which steps?


Maybe steps is not the right word but rather function calling. In command line there are two, one for create model and another for running this model:

$NUPIC/scripts/run_swarm.py $PWD/search_def.json --maxWorkers=6
$NUPIC/scripts/run_opf_experiment.py $PWD/model_0/

in python there is one for creating model (I guess, if I am wrong correct me please):
model_params = swarm_over_data()

and those for running model (again, if I am wrong correct me)
model = ModelFactory.create(model_params) model.enableInference({"predictedField": "sine"})
result = model.run({"sine": sine_value})



    4. When I have swarm data created in the past and I did not touch
    the input data how can I reuse it and run model in python?

call the model with the params you got from swarming; If you really did
not change the data, you could have saved the model (serialize) and then
just restore it back again.

Can you post python code of how to do that. If I understand It correct I need to somehow convince following function: ModelFactory.create() to take output from already existed files (model_0 dir) and not to pass it the result of swarm_over_data() as in previous example because it would swarm again.


Note on swarming: of course it helps, but in Nupic the general defaults
are just "good enough" usually and the model adjusts to the data itself,
so usually just running the model is enough.

But you must have some model (created by swarming) to being able to push your data to NuPIC. You cannot skip swarming process (unless you create model manually) or do you?




Regards


Wakan

Reply via email to