Response inline.

On Sun, Sep 1, 2013 at 2:10 AM, Tim McNamara <[email protected]>wrote:

> Hi
>
> I thought I would provide some thoughts as I try to progress through the
> funnel outlined in the "NuPIC Consumer Engagement Strategy" at [0] (as a
> sidenote, it is very interesting and quite refreshing to see a commercial
> organisation be explicit with strategy documents like these).
>
> Hopefully these notes will be of some value as Numenta refines the
> strategy. I haven't quite got to the bottom of the funnel, but hopefully
> I'll get there!
>
> <https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy#watcher--builder-obstacles>Watcher
> ⟹ Builder Obstacles**
> Some things that have tripped me up which are not mentioned
>
>
>    - I need to patch the source to workaround "linux3" result to
>    sys.platform
>    - "source env.sh" is somewhat annoying to remember (don't want to hard
>    code this in my .bashrc)
>    - building - why does NuPIC try to uninstall Python modules that it
>    can find? Why not use system dependencies is they are there to use?
>
>
> <https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy#builder--experimenter-obstacles>Builder
> ⟹ Experimenter Obstacles
> NuPIC's documentation is great - but after hours of reading I still don't
> know how to give the CLA a CSV file (with the two extra headers) and ask it
> to start predicting results.
>

There's no reason you have to read in sensorRecords from a CSV file
exclusively. You can just as easily generate the data from whatever output
stream you find useful. Take a look at the code from my hackathon
experiment:

https://github.com/ravaa/nupic/blob/master/predipic/runtest.py

What's fed into the model instance is just a dict of the record headers and
the data. I could have (and would have if hadn't spent the time getting a
web console running) been generating those records from a python generator
fed from sensors (something i'm poking at now), a socket, etc..




> Reading through the source hasn't been as straightforward as I would have
> like. I've ended up spending quite a bit of time. Rather than start with
> hotgym.py, I have been searching down the rabbit hole of classes/methods
> used by OpfRunExperiment.py, given that is what I was executing to run the
> experiment.
>
> [edit/update: after taking a look at hotgym.py, things actually look
> relatively simple to get started with, perhaps consider this a case of RTFM]
>
>
> After spending some time in the source and noticing that Grok uses a
> database, rather than CSV files, I feel that Numenta has perhaps added an
> obstacle here to 3rd party developers here. Support for a wide variety of
> input formats could make it easy for people to experiment with their own
> data stored in data warehouses. (sorry, I can't remember where in the
> source tree I found this reference as I was reading today)
>
> <https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy#experimenter--developer-obstacles>Experimenter
> ⟹ Developer ObstaclesSome anecdotal evidence to support the barriers
> listed in this section. Results interpretation is a bit tricky. When
> running hotgym for the first time, in the terminal, all I see is the result
> metrics:
>
> ...
> <JSON>
> {
>
> "prediction:trivial:errorMetric='aae':steps=5:window=1000:field=consumption":
> 15.844123711340194,
>
> "prediction:trivial:errorMetric='altMAPE':steps=5:window=1000:field=consumption":
> 56.13599339610908,
>
> "multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=5:window=1000:field=consumption":
> 43.27516112448442,
>
> "prediction:trivial:errorMetric='altMAPE':steps=1:window=1000:field=consumption":
> 20.833911019329946,
>
> "prediction:trivial:errorMetric='aae':steps=1:window=1000:field=consumption":
> 5.86262833675565,
>
> "multiStepBestPredictions:multiStep:errorMetric='aae':steps=1:window=1000:field=consumption":
> 5.305007751770775,
>
> "multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=1:window=1000:field=consumption":
> 18.852305332800853,
>
> "multiStepBestPredictions:multiStep:errorMetric='aae':steps=5:window=1000:field=consumption":
> 12.214213466328994
> }
> </JSON>
> ...
>
>
> It has taken me a long time to figure our that the actual prediction
> results are provided in /tmp/tmpXXXXX/generated_output.csv and that there
> is more output saved at
> /nta/logs/numenta-logs-username/OpfRunExperiment-NNNNNNNNNN-NNNN.log
>
>
> The Numenta Python style guide and some of the practices does not lend
> itself to being familiar to Python programmers. I assume that the style
> guide is designed to assist C++ programmers function efficiently in a
> dynamic language, by preventing an overly large context switch. However, to
> me it presents a few degrees of learning curve gradient as I attempt to
> learn a style.
>
> Here are two examples that I've noticed myself wincing at:
>
>
>    - "_TaskRunner.getFieldInfo" (line 606 of experiment_runner.py) is not
>    Pythonic. Python programmers would tend towards defining
>    "_TaskRunner.fields"
>    - within logging, prefixing module paths with "com.numenta" is quite
>    frustrating. I personally don't see why the namespacing is needed. The
>    string is hard coded in /nta/conf/default/nupic-logging.conf, so I assume
>    it doesn't distinguish between different code modules being run
>
>
> I should note that it's very useful that NuPIC's code is so very
> thoroughly documented. Thank you for showing such discipline.
>
> I hope that these notes are useful!
>
>
>
> Tim McNamara
> @timClicks <http://twitter.com/timClicks> | timmcnamara.co.nz
>
> [0]
> https://github.com/numenta/nupic/wiki/NuPIC-Consumer-Engagement-Strategy
>
>
>
> <http://timmcnamara.co.nz/>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to