Several questions regarding swarming

Wakan Tanka Mon, 29 Feb 2016 16:05:05 -0800

Hello NuPIC,

I have read several docs about swarming and models but despite that Istill have several questions. There are two code snippets:



swarm config for sine prediction:
---------------------------------

{
  "includedFields": [
    {
      "fieldName": "sine",
      "fieldType": "float",
      "maxValue": 1,
      "minValue": -1
    }
  ],
  "streamDef": {
    "info": "sine",
    "version": 1,
    "streams": [
      {
        "info": "sine.csv",
        "source": "file://sine.csv",
        "columns": [
          "*"
        ]
      }
    ]
  },
  "inferenceType": "TemporalAnomaly",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "sine"
  },
  "swarmSize": "medium"
}


model.run for sine prediction:
------------------------------
result = model.run({"sine": sine_value})


csv input data for sine prediction:
-----------------------------------
angle,sine
float,float
,
0.0,0.0
0.06283185307179587,0.06279051952931337
0.12566370614359174,0.12533323356430426


################################################
################################################
################################################


Swarm config for hotgym prediction:
-----------------------------------
{
  "includedFields": [
    {
      "fieldName": "timestamp",
      "fieldType": "datetime"
    },
    {
      "fieldName": "kw_energy_consumption",
      "fieldType": "float",
      "maxValue": 53.0,
      "minValue": 0.0
    }
  ],
  "streamDef": {
    "info": "kw_energy_consumption",
    "version": 1,
    "streams": [
      {
        "info": "Rec Center",
        "source": "file://rec-center-hourly.csv",
        "columns": [
          "*"
        ]
      }
    ],
  },

  "inferenceType": "TemporalMultiStep",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "kw_energy_consumption"
  },
  "iterationCount": -1,
  "swarmSize": "medium"
}


model.run for hotgym prediction/anomaly:
----------------------------------------
result = model.run({
      "timestamp": timestamp,
      "kw_energy_consumption": consumption
    })



csv example for hotgym prediction/anomaly:
------------------------------------------
timestamp,kw_energy_consumption
datetime,float
T,
7/2/10 0:00,21.2
7/2/10 1:00,16.4
7/2/10 2:00,4.7

1st codes are from sine prediction tutorial [6] 2nd is from hot gymprediction tutorial [7]. Questions:

1. Why there are included both columns ("timestamp" and"kw_energy_consumption") in 2nd swarm config, while there is only onecolumn ("sine") in 1st example under "includedFields"? If I understandcorrect then in 1st example swarm will only operate on "sine" (not"angle") column and in 2nd example swarm will operate on both columns("timestamp" and "kw_energy_consumption"), is this correct? Is it worthto incorporate "angle" in 1st example or vice versa remove "timestamp"in 2nd example? What would happen? I guess that in 2nd example only"kw_energy_consumption" is needed because this is what we want predictand in 1st config we want to predict "sine" so "angle" will bemeaningless. Does more columns automatically mean better model or whatis going on?

2. What is relationship between includedFields vs['streamDef']['streams'][0]['columns']? Isn’t this redundant? What elseexcept '*' can be contained under ['streamDef']['streams'][0]['columns']when should I change this?

3. What (SDR) encoder is used as a default? I guess it should bepossible to change it because as it is mentioned in [1]: "There are anumber of factors that swarming considers when creating potential modelsto evaluate ... which model components should be used (encoders, spatial& temporal poolers, classifier, etc.), and what parameter values shouldbe chosen for each component."And also in [2]: "Swarming figures out which optional components shouldgo into a model (encoders, spatial pooler, temporal pooler, classifier,etc.),"The only way regarding changing encoder I’ve found is trying to decipherthe JSON schema [3] and list of available encoders [4].

4. In JSON schema description [3] and in [2] there is shown using custommetrics. I guess those metrics affects the best model election duringswarm, or am I wrong? Are there any code examples which uses furtherfields mentioned in JSON schema [3]?

5. Is it possible to have different columns under includedFields andpredictedField. In other words: does it make any sense to make modeloperate (predict or detect anomalies) on another columns that swarm wasrunning on? I guess not but one never knows.

6. Can somebody please explain me following statement from [2] "Swarmingalso figures out which fields of the input are useful in making goodpredictions. If a field is not useful, it is not included in the finalmodel."I’m the one who specify what to include in swarming (underincludedFields) not some algorithm or am I wrong?

7. Can I understand permutations.py [2] as a lower level control ofswarm, are there any examples?

[1] Swarming Algorithm -https://github.com/numenta/nupic/wiki/Swarming-Algorithm

[2] Running Swarms - https://github.com/numenta/nupic/wiki/Running-Swarms

[3] experimentDescriptionSchema.json -https://github.com/numenta/nupic/blob/master/src/nupic/swarming/exp_generator/experimentDescriptionSchema.json[4] encoders -https://github.com/numenta/nupic/tree/master/src/nupic/encoders

[5] Inference Types - https://github.com/numenta/nupic/wiki/Inference-Types

[6]https://github.com/rhyolight/nupic.examples/blob/master/sine-prediction/sine_experiment.py[7]https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym




Thank you

Several questions regarding swarming

Reply via email to