Hello Marek,
On 05/21/2015 03:50 AM, Marek Otahal wrote:
Hi,
I'll try to follow on the swarming vs. running question:
-"run model" = model + parameters + input data ->produce outputs
(predictions, ...)
-"swarm" = model + some data (+lot of time) -> find "optimal" parameters
(for model+data)
So, below..
This is my understanding:
Pushing data to NuPIC algorithm requires model. The most easy way of
creating this model is to using swarming process. Swarming basically
tries various models combinations over your data and if particular model
is not good then it is drop. So after swarming you should have the best
model for your data. This whole process is achieved with following command:
$NUPIC/scripts/run_swarm.py $PWD/search_def.json --maxWorkers=6
Q:
I do not know what should I imagine under model, maybe it is some
parameters etc. so this is the another question.
If swarming is most easy process, what other methods can be used to
create model?
What criteria should model met to be classified as good or bad during
swarming?
Now, when you have best model for your data, you need to pass this model
+ your input data to NuPIC to get something useful (I know about
prediction, which is basically configured with "predictionSteps" what
else usefull info can be gain from your data is another question). This
process is achieved with following command:
$NUPIC/scripts/run_opf_experiment.py $PWD/model_0/
After this you will gen inference file where the prediction will be stored.
The reason for two separate processes for creating best model and
running the best model that comes to my mind right now is the fact that
running model on data can be also CPU intensive (I do not know this
true, just guessing, can someone make statement?). So you don't have to
run two CPU intensive processes just one (running model) under condition
you did not touch your original data.
I would like to know what would happen If I create model with swarming
for some data and then change those data and run this model? Is this
complete nonsense or is this used somewhere?
I would also ask If I can specify to nupic to stop to learn after
certain amount of data? AFAIK nupic is still learning so it might happen
that it will learn also anomalies and consider them as normal?
E.g. Imagine the sine prediction example. First the NuPIC did not know
about the data and simply repeat what it sees and it's anomaly score is
high. Then when certain amount of periods passes NuPIC will learn this
pattern, lower the anomaly score. Anything that is close to this pattern
will have low anomaly score. In other words it will be able to predict
what will happen in given time and answer questions like will be the
function in next step increasing or decreasing or not changing etc. The
problem is it will be constantly learning, how can I achieve to stop
learning e.g. after certain amount of data and give just an anomaly score?
1. Is there any benefit for running swarming and running model in
two steps?
yes, swarm=parameters, run=outputs; although swarming imho saves also
the output of the best model, so you can already use that.
Yes, when you run swarm you get the best model, the only benefit of
reusing "old" swarm is the CPU load described above. What benefit gives
me the swarm parameters (model) when I do not use it on data?
2. Is running only swarming without running model useful for something?
yes, parameters.
Same question as above, what are parameters itself good for?
3. Is there any benefit for three separate steps for running model
in python?
which steps?
Maybe steps is not the right word but rather function calling. In
command line there are two, one for create model and another for running
this model:
$NUPIC/scripts/run_swarm.py $PWD/search_def.json --maxWorkers=6
$NUPIC/scripts/run_opf_experiment.py $PWD/model_0/
in python there is one for creating model (I guess, if I am wrong
correct me please):
model_params = swarm_over_data()
and those for running model (again, if I am wrong correct me)
model = ModelFactory.create(model_params)
model.enableInference({"predictedField": "sine"})
result = model.run({"sine": sine_value})
4. When I have swarm data created in the past and I did not touch
the input data how can I reuse it and run model in python?
call the model with the params you got from swarming; If you really did
not change the data, you could have saved the model (serialize) and then
just restore it back again.
Can you post python code of how to do that. If I understand It correct I
need to somehow convince following function: ModelFactory.create() to
take output from already existed files (model_0 dir) and not to pass it
the result of swarm_over_data() as in previous example because it would
swarm again.
Note on swarming: of course it helps, but in Nupic the general defaults
are just "good enough" usually and the model adjusts to the data itself,
so usually just running the model is enough.
But you must have some model (created by swarming) to being able to push
your data to NuPIC. You cannot skip swarming process (unless you create
model manually) or do you?
Regards
Wakan