Hello NuPIC,

I have read several docs about swarming and models but despite that I still have several questions. There are two code snippets:


swarm config for sine prediction:
---------------------------------

{
  "includedFields": [
    {
      "fieldName": "sine",
      "fieldType": "float",
      "maxValue": 1,
      "minValue": -1
    }
  ],
  "streamDef": {
    "info": "sine",
    "version": 1,
    "streams": [
      {
        "info": "sine.csv",
        "source": "file://sine.csv",
        "columns": [
          "*"
        ]
      }
    ]
  },
  "inferenceType": "TemporalAnomaly",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "sine"
  },
  "swarmSize": "medium"
}


model.run for sine prediction:
------------------------------
result = model.run({"sine": sine_value})


csv input data for sine prediction:
-----------------------------------
angle,sine
float,float
,
0.0,0.0
0.06283185307179587,0.06279051952931337
0.12566370614359174,0.12533323356430426


################################################
################################################
################################################


Swarm config for hotgym prediction:
-----------------------------------
{
  "includedFields": [
    {
      "fieldName": "timestamp",
      "fieldType": "datetime"
    },
    {
      "fieldName": "kw_energy_consumption",
      "fieldType": "float",
      "maxValue": 53.0,
      "minValue": 0.0
    }
  ],
  "streamDef": {
    "info": "kw_energy_consumption",
    "version": 1,
    "streams": [
      {
        "info": "Rec Center",
        "source": "file://rec-center-hourly.csv",
        "columns": [
          "*"
        ]
      }
    ],
  },

  "inferenceType": "TemporalMultiStep",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "kw_energy_consumption"
  },
  "iterationCount": -1,
  "swarmSize": "medium"
}


model.run for hotgym prediction/anomaly:
----------------------------------------
result = model.run({
      "timestamp": timestamp,
      "kw_energy_consumption": consumption
    })



csv example for hotgym prediction/anomaly:
------------------------------------------
timestamp,kw_energy_consumption
datetime,float
T,
7/2/10 0:00,21.2
7/2/10 1:00,16.4
7/2/10 2:00,4.7





1st codes are from sine prediction tutorial [6] 2nd is from hot gym prediction tutorial [7]. Questions:


1. Why there are included both columns ("timestamp" and "kw_energy_consumption") in 2nd swarm config, while there is only one column ("sine") in 1st example under "includedFields"? If I understand correct then in 1st example swarm will only operate on "sine" (not "angle") column and in 2nd example swarm will operate on both columns ("timestamp" and "kw_energy_consumption"), is this correct? Is it worth to incorporate "angle" in 1st example or vice versa remove "timestamp" in 2nd example? What would happen? I guess that in 2nd example only "kw_energy_consumption" is needed because this is what we want predict and in 1st config we want to predict "sine" so "angle" will be meaningless. Does more columns automatically mean better model or what is going on?


2. What is relationship between includedFields vs ['streamDef']['streams'][0]['columns']? Isn’t this redundant? What else except '*' can be contained under ['streamDef']['streams'][0]['columns'] when should I change this?


3. What (SDR) encoder is used as a default? I guess it should be possible to change it because as it is mentioned in [1]: "There are a number of factors that swarming considers when creating potential models to evaluate ... which model components should be used (encoders, spatial & temporal poolers, classifier, etc.), and what parameter values should be chosen for each component." And also in [2]: "Swarming figures out which optional components should go into a model (encoders, spatial pooler, temporal pooler, classifier, etc.)," The only way regarding changing encoder I’ve found is trying to decipher the JSON schema [3] and list of available encoders [4].


4. In JSON schema description [3] and in [2] there is shown using custom metrics. I guess those metrics affects the best model election during swarm, or am I wrong? Are there any code examples which uses further fields mentioned in JSON schema [3]?


5. Is it possible to have different columns under includedFields and predictedField. In other words: does it make any sense to make model operate (predict or detect anomalies) on another columns that swarm was running on? I guess not but one never knows.


6. Can somebody please explain me following statement from [2] "Swarming also figures out which fields of the input are useful in making good predictions. If a field is not useful, it is not included in the final model." I’m the one who specify what to include in swarming (under includedFields) not some algorithm or am I wrong?


7. Can I understand permutations.py [2] as a lower level control of swarm, are there any examples?


[1] Swarming Algorithm - https://github.com/numenta/nupic/wiki/Swarming-Algorithm
[2] Running Swarms - https://github.com/numenta/nupic/wiki/Running-Swarms
[3] experimentDescriptionSchema.json - https://github.com/numenta/nupic/blob/master/src/nupic/swarming/exp_generator/experimentDescriptionSchema.json [4] encoders - https://github.com/numenta/nupic/tree/master/src/nupic/encoders
[5] Inference Types - https://github.com/numenta/nupic/wiki/Inference-Types
[6] https://github.com/rhyolight/nupic.examples/blob/master/sine-prediction/sine_experiment.py [7] https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym



Thank you

Reply via email to