You are spot on... nupic_forwarder.py is insufficient for my particular use
case because it is stateless and just running over the increasing volume of
the data every iteration it runs. For now this is more of a proof of
concept to demonstrate a starting point of what potentially could be
accomplished.

For the time being I am going to mess around, most likely for the remainder
of this week. keeping track of position and making use of check pointing
and model serialization to preserve the models state.

After this week I might swap servers to get something on digitalocean and
re-create what I have been messing around with. My expertise is with
kafka/spark-streaming and mllib, but you have alot more knowledge with the
nupic model and potential integration scenarios so I can certainly see some
collaboration there. I will keep you updated on what I can get going

On Thu, Jun 4, 2015 at 2:36 PM, Austin Marshall <[email protected]> wrote:

> Based on my understanding, I suspect nupic_forwarder.py as-is will be
> insufficient for your use-case.  You will need to implement some
> bookkeeping in order to keep track of position as has been done with the
> grok forwarder, and also make use of model serialization/checkpointing to
> preserve the state between runs, which is not currently done in
> nupic.rogue.  However, since you're running it every 15 seconds, I think
> it's best to keep a process running rather than use cron since there's
> significant i/o overhead in constantly checkpointing and reloading the
> model.
>
> That said, I'm interested in a nupic, spark, and kafka integration
> scenario and can help swap out the backend.  Once you have the agents
> forwarding to kafka (rather than rrdtool), it may simplify the nupic
> integration.  I'm happy to help in any way I can.
>
> On Thu, Jun 4, 2015 at 11:10 AM, Michael Parco <
> [email protected]> wrote:
>
>> By deleting those variables and running the script nupic_forwarder.py
>> again (I eventually figured out I needed to add the
>> --prefix=/path/to/rrddb) the model ran over the data and produced results
>> for each metric. My goal is to set this up temporarily in the cron to
>> change the RRA measures to every 15 seconds and a run script for the nupic
>> model every 15 seconds. I have pretty beefy servers so hopefully cpu/memory
>> intensive processes should be no issue. Ultimately my goal is to integrate
>> this in with kafka and spark, so I will post about it if I can accomplish
>> this.
>>
>> On Thu, Jun 4, 2015 at 2:01 PM, Michael Parco <
>> [email protected]> wrote:
>>
>>> It appears there may be a few more keys that were either renamed or
>>> deleted from model_params -- randomSP gave me an error (I just deleted and
>>> re-ran) and useHighTier (I also just deleted and re-ran). If they have been
>>> named let me know, I looked through the post talking about renames, but did
>>> not see these particular variables.
>>>
>>> On Thu, Jun 4, 2015 at 12:08 PM, Austin Marshall <[email protected]>
>>> wrote:
>>>
>>>> Ah, yes!  We renamed some of the keys in model params in
>>>> https://github.com/numenta/nupic/pull/1872
>>>>
>>>> "coincInputPoolPct" is now "potentialPct", for example.
>>>>
>>>> I've updated master in nupic.rogue with the updated params in
>>>> https://github.com/numenta/nupic.rogue/pull/2/files and you should be
>>>> able to pull in the latest to fix your specific problem.
>>>>
>>>> On Thu, Jun 4, 2015 at 7:52 AM, Michael Parco <
>>>> [email protected]> wrote:
>>>>
>>>>> Austin this was a great rundown on the ins and outs of nupic rogue.
>>>>> I've done a lot of work with rrdtool and used such agents previously to
>>>>> stream metrics data collected by ganglia agents to some streaming
>>>>> analytics. Although I think rrdtool is great for temporary local storage, 
>>>>> I
>>>>> am looking to possible replace it with a different backend that I can
>>>>> better communicate with.
>>>>>
>>>>> I think the issue that I have seen thus far is that the nupic
>>>>> forwarder has been giving me errors when I attempt to forward data to it.
>>>>> "RuntimeError: Unknown parameter 'coincInputPoolPct' for region 'SP' of
>>>>> type 'py.SPRegion'" and then gives me a list of valid parameters. This
>>>>> seems to be an error from nupic within python itself and not anything to 
>>>>> do
>>>>> with nupic.rogue
>>>>>
>>>>> On Wed, Jun 3, 2015 at 11:12 PM, Austin Marshall <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> There are a few options, depending on what you’re trying to do.
>>>>>>
>>>>>> The application is structured such that a set of agents (
>>>>>> https://github.com/numenta/nupic.rogue/blob/master/avogadro/__init__.py#L39-L60)
>>>>>>  that
>>>>>> run constantly in the background and periodically poll for metrics and 
>>>>>> save
>>>>>> to a local rrdtool database.   rrdtool is essentially a flat file-based
>>>>>> time series database with some interesting properties.  In the case of
>>>>>> nupic.rogue, it’s used as a buffer between either a grok instance (
>>>>>> https://aws.amazon.com/marketplace/pp/B00I18SNQ6/ref=srh_res_product_title?ie=UTF8&sr=0-2&qid=1433386525659)
>>>>>>  if
>>>>>> using the grok forwarder, or a nupic model if using the nupic forwarder.
>>>>>> Then, the data is forwarded for analysis in a separate process.
>>>>>>
>>>>>> There is one major difference between the grok forwarder and nupic
>>>>>> forwarder: the grok forwarder is meant to be run regularly with cron.  
>>>>>> The
>>>>>> grok forwarder maintains a set of “.pos” files to keep track of position
>>>>>> between runs so that it can send everything since the last run to a 
>>>>>> running
>>>>>> grok instance.  The nupic forwarder has no such bookkeeping and sends the
>>>>>> entire batch to a freshly created model in a one-off sort of way, and 
>>>>>> saves
>>>>>> the results to a cvs file locally.
>>>>>>
>>>>>> If you want to set it up in a streaming fashion, imagine replacing
>>>>>> the rrdtool component with some sort of queue implementation (say, 
>>>>>> rabbitmq
>>>>>> or redis pubsub).  You could even do simple communication over a socket.
>>>>>> The code is structured such that one back end can be swapped out for
>>>>>> another (there only happens to be one back end right now — “rrdtool”).  
>>>>>> For
>>>>>> example, each of the agents are a subclass of AvogadroAgent, which itself
>>>>>> is a subclass of RRDToolClient.  You could create an alternate back end
>>>>>> implementation that writes to a queue, and change AvogradoAgent to be a
>>>>>> subclass of your new class rather than RRDToolClient.  If that’s what 
>>>>>> you’d
>>>>>> like to do, I suggest starting with a copy of
>>>>>> https://github.com/numenta/nupic.rogue/blob/master/avogadro/rrdtool.py,
>>>>>> remove the methods prefixed with “_”, and re-implement __init__(),
>>>>>> createParams(), addParseOptions(), and store() to suit your needs.  Then,
>>>>>> you need only write a simple script which reads from the queue and feeds
>>>>>> the samples to a model you’ve created.
>>>>>>
>>>>>> You could also create a new forwarder which is sort of a hybrid
>>>>>> between the grok and nupic forwarders.  For example, use the “.pos” file
>>>>>> approach of grok forwarder, and keep the script running rather than
>>>>>> scheduled by cron periodically.
>>>>>>
>>>>>> On Jun 3, 2015, at 2:28 PM, Michael Parco <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> I am attempting to setup a similar system to the way grok would
>>>>>> operate using nupic algorithms. Currently I git cloned the nupic.rogue
>>>>>> github and built nupic.rogue using the setup python scripts. I have also
>>>>>> built nupic and run some of the test examples such as hot gym and cpu
>>>>>> predictions.
>>>>>>
>>>>>> I have the nupic.rogue agent running and collecting cpu I/O, network,
>>>>>> memory data as rrds and I am able to execute rogue-export --prefix=var/db
>>>>>> to obtain the .csv conversions of the rrd files. My next step is to feed
>>>>>> the .csv files or .rrd files in an nupic model running the HTM model to 
>>>>>> run
>>>>>> for predictions and anomaly scores. Ideally I would like to set this up 
>>>>>> in
>>>>>> a streaming fashion on a local box. I came across the nupic_forwarder.py
>>>>>> script within nupic.rogue, but I have been unable to feed in the 
>>>>>> collected
>>>>>> data... any ideas?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to