Re: HTM.Java performance in HTM-Moclu

cogmission (David Ray) Tue, 08 Dec 2015 03:22:51 -0800

Hi Fergal,

It's not that big of a deal. I just haven't done a round of profiling yet.
Therefore there is lots of room for improvement in terms of memory
handling. There are lots of JVM applications running really data heavy
applications and the state of the art JVM GC is fully capable of handling
these loads. I did a preliminary profiling session back in January and
found some places where memory consumption could be optimized - meaning to
get back to it after the Network API was finished because I didn't want to
optimize things before I got to see some typical usage patterns. If GC were
an inescapable problem you wouldn't have the tons of mission critical apps
in the financial industry that are running today.


I am only one person, and I will get around to it. HTM.java is technically
a pre-release (alpha) version for this reason.

Cheers,
David

On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne <fergalbyrnedub...@gmail.com>
wrote:

> Hi Matt,
>
> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can
> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym hourly
> data - it slows to a crawl after 1000 rows because it's thrashing the GC
> trying to free space on the heap (setting -Xmx2800m as JVM params helps).
> Good luck trying to keep any more than one model per JVM up for any length
> of time.
>
> If you run htm-moclu in a single JVM, something somewhere will have live
> references leading to every Network you have loaded. So your live heap is
> going to be at least N models x heap per model. This is not a big problem
> until you start growing distal segments, which are Java objects on the
> heap. In HTM.java this happens in the TM, which grows as it learns.
>
> GC will detect an impending OOM condition, then will stop the world and
> mark all these live references, traversing your millions of objects.
> Finding nothing to free, the JVM will eventually fail at some unpredictable
> and unrelated point in the code.
>
> To check this, run this function every few rows:
>
> void mem() {
>     int mb = 1024 * 1024;
>
>     // get Runtime instance
>     Runtime instance = Runtime.getRuntime();
>
>     //System.out.println("***** Heap utilization statistics [MB] *****\n");
>
>     // available memory
>     System.out.println("Total: " + instance.totalMemory() / mb
>             + "\tFree: " + instance.freeMemory() / mb
>             + "\tUsed Memory: " + (instance.totalMemory() - 
> instance.freeMemory()) / mb
>             + "\tMax Memory: " + instance.maxMemory() / mb);
> }
>
> Regards,
>
> Fergal Byrne
>
> [1] https://youtu.be/FihU5JxmnBg?t=38m6s
>
> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) <
> cognitionmiss...@gmail.com> wrote:
>
>> Hey Matt, did you try ramping up from 1 model to see if it was a capacity
>> issue? I would be interested to see how the system responds as an
>> increasing number of models are added. Anyway, I can't really comment on
>> moclu as I don't know what's happening there and I don't have time these
>> days to help investigate as I am stretched a bit thin at the moment.
>>
>> @antidata if you could explain what you mean by "renders the JVM
>> unresponsive" it would help me possibly attend to any issue there might be
>> in the Network API though I never had any problems with unresponsiveness at
>> all. Thanks...
>>
>> Cheers,
>> David
>>
>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <m...@numenta.org> wrote:
>>
>>> David, BTW the failure in the video is a 4m:
>>> https://youtu.be/DnKxrd4TLT8?t=4m
>>> ---------
>>> Matt Taylor
>>> OS Community Flag-Bearer
>>> Numenta
>>>
>>>
>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <m...@numenta.org> wrote:
>>> > David and Mike,
>>> >
>>> > I've moved this to another topic to discuss.
>>> >
>>> > So what I tried with moclu was to take the HTM engine traffic app as
>>> shown here:
>>> >
>>> >
>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg
>>> >
>>> > And I swapped out the entire green python box containing the HTM
>>> > Engine and replaced it with a local instance of moclu. When the
>>> > traffic app starts up, it creates 153 models immediately and then
>>> > starts pushing data into all of them at once:
>>> >
>>> > https://youtu.be/lzJd_a6y6-E?t=15m
>>> >
>>> > This caused dramatic failure in HTM Moclu, and I think that is what
>>> > Mike's talking about. I recorded it for Mike here:
>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8
>>> >
>>> > I hope that explains some things.
>>> >
>>> > ---------
>>> > Matt Taylor
>>> > OS Community Flag-Bearer
>>> > Numenta
>>> >
>>> >
>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray)
>>> > <cognitionmiss...@gmail.com> wrote:
>>> >>>  the issue you faced is that it can't create hundreds of models at
>>> the
>>> >>> same time (like its done by the traffic example) because instantiate
>>> a
>>> >>> Network object from Htm.java is an expensive operation that turns
>>> the JVM
>>> >>> unresponsive.
>>> >>
>>> >> What is being implied here? Are you saying that instantiating
>>> HTM.java is
>>> >> anymore expensive than instantiating any other medium weight
>>> application?
>>> >>
>>> >> Cheers,
>>> >> David
>>> >>
>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta <mmlucche...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Hello Matt, folks
>>> >>>
>>> >>> You can currently use Htm-MoClu in just one computer, the issue you
>>> faced
>>> >>> is that it can't create hundreds of models at the same time (like
>>> its done
>>> >>> by the traffic example) because instantiate a Network object from
>>> Htm.java
>>> >>> is an expensive operation that turns the JVM unresponsive.
>>> >>>
>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the only
>>> >>> thing missing from your specs is:
>>> >>>
>>> >>> `allows POST of full model params`
>>> >>>
>>> >>> Will chat over Gitter to get more details on this.
>>> >>>
>>> >>> You can find an example of its usage in
>>> https://github.com/antidata/ATAD
>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to the
>>> browser
>>> >>> in real time (similar to web sockets proposition) and saves the
>>> requests  +
>>> >>> results into MongoDB so you can query both the data coming from
>>> outside and
>>> >>> the data generated from HTM (anomaly score + predictions).
>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you
>>> can use
>>> >>> any web framework that works on the JVM.
>>> >>>
>>> >>> Feel free to ping me if any of you like to contribute to this
>>> project.
>>> >>>
>>> >>> Thanks!
>>> >>>
>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <m...@numenta.org>
>>> wrote:
>>> >>>>
>>> >>>> Ok folks, let's move discussion of the implementation to Github.
>>> First
>>> >>>> question to answer is which HTM implementation to use:
>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2
>>> >>>>
>>> >>>> Anyone else reading this is free to jump in and help out, but I want
>>> >>>> to define our work properly using Github issues so we all know what
>>> is
>>> >>>> happening and who is working on what.
>>> >>>> ---------
>>> >>>> Matt Taylor
>>> >>>> OS Community Flag-Bearer
>>> >>>> Numenta
>>> >>>>
>>> >>>>
>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie <
>>> jonm...@gmail.com>
>>> >>>> wrote:
>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for
>>> getting an
>>> >>>> > web
>>> >>>> > app off the ground quickly in python I recommend pyramid:
>>> >>>> > http://www.pylonsproject.org/
>>> >>>> >
>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <m...@numenta.org>
>>> wrote:
>>> >>>> >>
>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in this
>>> >>>> >> email. But first, who reading this would want to use an HTM over
>>> HTTP
>>> >>>> >> service like this? It means that you won't need to have HTM
>>> running on
>>> >>>> >> the same system that is generating the data. It's basically HTM
>>> in the
>>> >>>> >> Cloud. :)
>>> >>>> >>
>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis <mrcs...@gmail.com
>>> >
>>> >>>> >> wrote:
>>> >>>> >> > I'm interested in HTTP GET, inspecting models.
>>> >>>> >>
>>> >>>> >> Great feature to add after a minimum viable product has been
>>> created,
>>> >>>> >> but this adds the complexity of either caching or persistence
>>> >>>> >> (depending on how much history you want).
>>> >>>> >>
>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray)
>>> >>>> >> <cognitionmiss...@gmail.com> wrote:
>>> >>>> >> > One thing I am concerned about is the call/answer nature of the
>>> >>>> >> > interface
>>> >>>> >> > you describe because of the latency involved in a
>>> >>>> >> > submit-one-row-per-call
>>> >>>> >> > methodology? Should it not be able to "batch" process rows of
>>> data
>>> >>>> >> > instead?
>>> >>>> >> > (batches could contain one row if you were dedicated to being a
>>> >>>> >> > masochist)?
>>> >>>> >>
>>> >>>> >> Yes, we will eventually need that, but I don't need it in the
>>> >>>> >> prototype. Let's focus on one row at a time and expand to
>>> batching
>>> >>>> >> later.
>>> >>>> >>
>>> >>>> >> > Next, at Cortical we use a technology called DropWizard which
>>> makes
>>> >>>> >> > it
>>> >>>> >> > very
>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I
>>> have
>>> >>>> >> > done
>>> >>>> >> > this
>>> >>>> >> > for Twitter processing involving HTM.java).
>>> >>>> >>
>>> >>>> >> If this is going to use NuPIC and python, I have found that it's
>>> super
>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for writing a
>>> class
>>> >>>> >> and a few functions. For REST on the JVM, I am open for
>>> suggestions.
>>> >>>> >>
>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger
>>> >>>> >> <passiweinber...@gmail.com> wrote:
>>> >>>> >> > Like a extended version of HTM engine?
>>> >>>> >> > This would be the solution to the htmengine prediction issue :)
>>> >>>> >>
>>> >>>> >> If we chose the HTM Engine option, then yes we would need to add
>>> some
>>> >>>> >> features to HTM Engine, especially prediction and user-defined
>>> model
>>> >>>> >> params. This is not a little job, but it would be great to have a
>>> >>>> >> scaling platform already built into the HTTP server. I would be
>>> happy
>>> >>>> >> even if we just started with an attempt to make HTM Engine (and
>>> the
>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud. Even
>>> with
>>> >>>> >> it's current capabilities, I could start using it immediately
>>> and we
>>> >>>> >> could add features over time.
>>> >>>> >>
>>> >>>> >> > Will you set up a repo in the community? :)
>>> >>>> >>
>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http
>>> >>>> >>
>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision is to
>>> >>>> >> decide which HTM implementation to use. I am leaning towards HTM
>>> >>>> >> Engine because it would take the smallest amount of effort to do
>>> the
>>> >>>> >> deployment configuration around it and get an MVP running the
>>> fastest
>>> >>>> >> (even if it doesn't to prediction or custom model params out of
>>> the
>>> >>>> >> box).
>>> >>>> >>
>>> >>>> >> IMO the best way to attack this is to get something minimal
>>> running
>>> >>>> >> ASAP and add features as required.
>>> >>>> >>
>>> >>>> >> [1] http://webpy.org/
>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http
>>> >>>> >> ---------
>>> >>>> >> Matt Taylor
>>> >>>> >> OS Community Flag-Bearer
>>> >>>> >> Numenta
>>> >>>> >>
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > Jonathan Mackenzie
>>> >>>> > BEng (Software) Hons
>>> >>>> > PhD Candidate, Flinders University
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> With kind regards,
>>> >>
>>> >> David Ray
>>> >> Java Solutions Architect
>>> >>
>>> >> Cortical.io
>>> >> Sponsor of:  HTM.java
>>> >>
>>> >> d....@cortical.io
>>> >> http://cortical.io
>>>
>>>
>>
>>
>> --
>> *With kind regards,*
>>
>> David Ray
>> Java Solutions Architect
>>
>> *Cortical.io <http://cortical.io/>*
>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>
>> d....@cortical.io
>> http://cortical.io
>>
>
>
>
> --
>
> Fergal Byrne, Brenter IT @fergbyrne
>
> http://inbits.com - Better Living through Thoughtful Technology
> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne
>
> Founder of Clortex: HTM in Clojure -
> https://github.com/nupic-community/clortex
> Co-creator @OccupyStartups Time-Bombed Open License
> http://occupystartups.me
>
> Author, Real Machine Intelligence with Clortex and NuPIC
> Read for free or buy the book at https://leanpub.com/realsmartmachines
>
> e:fergalbyrnedub...@gmail.com t:+353 83 4214179
> Join the quest for Machine Intelligence at http://numenta.org
> Formerly of Adnet edi...@adnet.ie http://www.adnet.ie
>



-- 
*With kind regards,*

David Ray
Java Solutions Architect

*Cortical.io <http://cortical.io/>*
Sponsor of:  HTM.java <https://github.com/numenta/htm.java>

d....@cortical.io
http://cortical.io

Re: HTM.Java performance in HTM-Moclu

Reply via email to