Hello NuPIC,
I have read several docs about swarming and models but despite that I
still have several questions. There are two code snippets:
swarm config for sine prediction:
---------------------------------
{
"includedFields": [
{
"fieldName": "sine",
"fieldType": "float",
"maxValue": 1,
"minValue": -1
}
],
"streamDef": {
"info": "sine",
"version": 1,
"streams": [
{
"info": "sine.csv",
"source": "file://sine.csv",
"columns": [
"*"
]
}
]
},
"inferenceType": "TemporalAnomaly",
"inferenceArgs": {
"predictionSteps": [
1
],
"predictedField": "sine"
},
"swarmSize": "medium"
}
model.run for sine prediction:
------------------------------
result = model.run({"sine": sine_value})
csv input data for sine prediction:
-----------------------------------
angle,sine
float,float
,
0.0,0.0
0.06283185307179587,0.06279051952931337
0.12566370614359174,0.12533323356430426
################################################
################################################
################################################
Swarm config for hotgym prediction:
-----------------------------------
{
"includedFields": [
{
"fieldName": "timestamp",
"fieldType": "datetime"
},
{
"fieldName": "kw_energy_consumption",
"fieldType": "float",
"maxValue": 53.0,
"minValue": 0.0
}
],
"streamDef": {
"info": "kw_energy_consumption",
"version": 1,
"streams": [
{
"info": "Rec Center",
"source": "file://rec-center-hourly.csv",
"columns": [
"*"
]
}
],
},
"inferenceType": "TemporalMultiStep",
"inferenceArgs": {
"predictionSteps": [
1
],
"predictedField": "kw_energy_consumption"
},
"iterationCount": -1,
"swarmSize": "medium"
}
model.run for hotgym prediction/anomaly:
----------------------------------------
result = model.run({
"timestamp": timestamp,
"kw_energy_consumption": consumption
})
csv example for hotgym prediction/anomaly:
------------------------------------------
timestamp,kw_energy_consumption
datetime,float
T,
7/2/10 0:00,21.2
7/2/10 1:00,16.4
7/2/10 2:00,4.7
1st codes are from sine prediction tutorial [6] 2nd is from hot gym
prediction tutorial [7]. Questions:
1. Why there are included both columns ("timestamp" and
"kw_energy_consumption") in 2nd swarm config, while there is only one
column ("sine") in 1st example under "includedFields"? If I understand
correct then in 1st example swarm will only operate on "sine" (not
"angle") column and in 2nd example swarm will operate on both columns
("timestamp" and "kw_energy_consumption"), is this correct? Is it worth
to incorporate "angle" in 1st example or vice versa remove "timestamp"
in 2nd example? What would happen? I guess that in 2nd example only
"kw_energy_consumption" is needed because this is what we want predict
and in 1st config we want to predict "sine" so "angle" will be
meaningless. Does more columns automatically mean better model or what
is going on?
2. What is relationship between includedFields vs
['streamDef']['streams'][0]['columns']? Isn’t this redundant? What else
except '*' can be contained under ['streamDef']['streams'][0]['columns']
when should I change this?
3. What (SDR) encoder is used as a default? I guess it should be
possible to change it because as it is mentioned in [1]: "There are a
number of factors that swarming considers when creating potential models
to evaluate ... which model components should be used (encoders, spatial
& temporal poolers, classifier, etc.), and what parameter values should
be chosen for each component."
And also in [2]: "Swarming figures out which optional components should
go into a model (encoders, spatial pooler, temporal pooler, classifier,
etc.),"
The only way regarding changing encoder I’ve found is trying to decipher
the JSON schema [3] and list of available encoders [4].
4. In JSON schema description [3] and in [2] there is shown using custom
metrics. I guess those metrics affects the best model election during
swarm, or am I wrong? Are there any code examples which uses further
fields mentioned in JSON schema [3]?
5. Is it possible to have different columns under includedFields and
predictedField. In other words: does it make any sense to make model
operate (predict or detect anomalies) on another columns that swarm was
running on? I guess not but one never knows.
6. Can somebody please explain me following statement from [2] "Swarming
also figures out which fields of the input are useful in making good
predictions. If a field is not useful, it is not included in the final
model."
I’m the one who specify what to include in swarming (under
includedFields) not some algorithm or am I wrong?
7. Can I understand permutations.py [2] as a lower level control of
swarm, are there any examples?
[1] Swarming Algorithm -
https://github.com/numenta/nupic/wiki/Swarming-Algorithm
[2] Running Swarms - https://github.com/numenta/nupic/wiki/Running-Swarms
[3] experimentDescriptionSchema.json -
https://github.com/numenta/nupic/blob/master/src/nupic/swarming/exp_generator/experimentDescriptionSchema.json
[4] encoders -
https://github.com/numenta/nupic/tree/master/src/nupic/encoders
[5] Inference Types - https://github.com/numenta/nupic/wiki/Inference-Types
[6]
https://github.com/rhyolight/nupic.examples/blob/master/sine-prediction/sine_experiment.py
[7]
https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym
Thank you