[nupic-dev] Some comments on

Tim McNamara Sun, 01 Sep 2013 02:22:18 -0700

Hi

I thought I would provide some thoughts as I try to progress through the
funnel outlined in the "NuPIC Consumer Engagement Strategy" at [0] (as a
sidenote, it is very interesting and quite refreshing to see a commercial
organisation be explicit with strategy documents like these).


Hopefully these notes will be of some value as Numenta refines the
strategy. I haven't quite got to the bottom of the funnel, but hopefully
I'll get there!

<https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy#watcher--builder-obstacles>Watcher
⟹ Builder Obstacles**
Some things that have tripped me up which are not mentioned


   - I need to patch the source to workaround "linux3" result to
   sys.platform
   - "source env.sh" is somewhat annoying to remember (don't want to hard
   code this in my .bashrc)
   - building - why does NuPIC try to uninstall Python modules that it can
   find? Why not use system dependencies is they are there to use?


<https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy#builder--experimenter-obstacles>Builder
⟹ Experimenter Obstacles
NuPIC's documentation is great - but after hours of reading I still don't
know how to give the CLA a CSV file (with the two extra headers) and ask it
to start predicting results.

Reading through the source hasn't been as straightforward as I would have
like. I've ended up spending quite a bit of time. Rather than start with
hotgym.py, I have been searching down the rabbit hole of classes/methods
used by OpfRunExperiment.py, given that is what I was executing to run the
experiment.

[edit/update: after taking a look at hotgym.py, things actually look
relatively simple to get started with, perhaps consider this a case of RTFM]


After spending some time in the source and noticing that Grok uses a
database, rather than CSV files, I feel that Numenta has perhaps added an
obstacle here to 3rd party developers here. Support for a wide variety of
input formats could make it easy for people to experiment with their own
data stored in data warehouses. (sorry, I can't remember where in the
source tree I found this reference as I was reading today)
<https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy#experimenter--developer-obstacles>Experimenter
⟹ Developer ObstaclesSome anecdotal evidence to support the barriers listed
in this section. Results interpretation is a bit tricky. When running
hotgym for the first time, in the terminal, all I see is the result metrics:

...
<JSON>
{

"prediction:trivial:errorMetric='aae':steps=5:window=1000:field=consumption":
15.844123711340194,

"prediction:trivial:errorMetric='altMAPE':steps=5:window=1000:field=consumption":
56.13599339610908,

"multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=5:window=1000:field=consumption":
43.27516112448442,

"prediction:trivial:errorMetric='altMAPE':steps=1:window=1000:field=consumption":
20.833911019329946,

"prediction:trivial:errorMetric='aae':steps=1:window=1000:field=consumption":
5.86262833675565,

"multiStepBestPredictions:multiStep:errorMetric='aae':steps=1:window=1000:field=consumption":
5.305007751770775,

"multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=1:window=1000:field=consumption":
18.852305332800853,

"multiStepBestPredictions:multiStep:errorMetric='aae':steps=5:window=1000:field=consumption":
12.214213466328994
}
</JSON>
...


It has taken me a long time to figure our that the actual prediction
results are provided in /tmp/tmpXXXXX/generated_output.csv and that there
is more output saved at
/nta/logs/numenta-logs-username/OpfRunExperiment-NNNNNNNNNN-NNNN.log


The Numenta Python style guide and some of the practices does not lend
itself to being familiar to Python programmers. I assume that the style
guide is designed to assist C++ programmers function efficiently in a
dynamic language, by preventing an overly large context switch. However, to
me it presents a few degrees of learning curve gradient as I attempt to
learn a style.

Here are two examples that I've noticed myself wincing at:


   - "_TaskRunner.getFieldInfo" (line 606 of experiment_runner.py) is not
   Pythonic. Python programmers would tend towards defining
   "_TaskRunner.fields"
   - within logging, prefixing module paths with "com.numenta" is quite
   frustrating. I personally don't see why the namespacing is needed. The
   string is hard coded in /nta/conf/default/nupic-logging.conf, so I assume
   it doesn't distinguish between different code modules being run


I should note that it's very useful that NuPIC's code is so very thoroughly
documented. Thank you for showing such discipline.

I hope that these notes are useful!



Tim McNamara
@timClicks <http://twitter.com/timClicks> | timmcnamara.co.nz

[0] https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy



<http://timmcnamara.co.nz/>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

[nupic-dev] Some comments on

Reply via email to