Hi Sato, Thanks for mentioning that. Yes, that is how you get high performance out of Java - it's essentially replacing objects with arrays as used in NuPIC. You'd have to redesign HTM.java to do that. I'm not sure that's a great idea - you'd be unable to innovate the algorithms flexibly.
Regards Fergal Byrne On Wed, Dec 9, 2015 at 12:50 AM, Takenori Sato <ts...@cloudian.com> wrote: > Hi David and Fergal, > > > NuPIC can do this by storing all the state in a set of C++ unboxed > arrays, and it can persist that, free the memory, and load a different > model. That's how HTM Engine (and Grok) can run lots of models. You can't > do this in a single JVM if you have live refs to your Networks, and > rebuilding a Network from disk is a big cost. > > In Java, MappedByteBuffer is often used for this purpose. For example, > Apache Cassandra in NoSQL, ElasticSearch/Lucene in Search Engine. > > The principles behind the design is: > > 1. less heap size for better throughput(depending on # of cores, but no > more than 8GB) > 2. OS is the best to manage memory(through virtual memory management with > file cache) > > With MappedByteBuffer, you can get any portion of bytes in your own binary > file to instantiate an object. You can let an object GCed soon after you > complete your operation, but whose bytes are managed in file cache, freed > and read in as needed. > > Thus, you can achieve both of low Java heap memory usage and the best > possible performance. > > Thanks, > Sato > > On Tue, Dec 8, 2015 at 8:34 PM, Fergal Byrne <fergalbyrnedub...@gmail.com> > wrote: > >> Hi David, >> >> No, I'm afraid this is an insurmountable problem if you use objects >> instead of unboxed arrays. NuPIC can do this by storing all the state in a >> set of C++ unboxed arrays, and it can persist that, free the memory, and >> load a different model. That's how HTM Engine (and Grok) can run lots of >> models. You can't do this in a single JVM if you have live refs to your >> Networks, and rebuilding a Network from disk is a big cost. >> >> The right way to do this is to have one or a very small number of >> Networks in each JVM, and manage them using something like htm-moclu. >> >> (By the way, this means you can't run NuPIC on GAE because they prohibit >> user-provided C/C++. Just Compute Engine or Container Engine). >> >> Regards >> >> Fergal >> >> On Tue, Dec 8, 2015 at 11:21 AM, cogmission (David Ray) < >> cognitionmiss...@gmail.com> wrote: >> >>> Hi Fergal, >>> >>> It's not that big of a deal. I just haven't done a round of profiling >>> yet. Therefore there is lots of room for improvement in terms of memory >>> handling. There are lots of JVM applications running really data heavy >>> applications and the state of the art JVM GC is fully capable of handling >>> these loads. I did a preliminary profiling session back in January and >>> found some places where memory consumption could be optimized - meaning to >>> get back to it after the Network API was finished because I didn't want to >>> optimize things before I got to see some typical usage patterns. If GC were >>> an inescapable problem you wouldn't have the tons of mission critical apps >>> in the financial industry that are running today. >>> >>> I am only one person, and I will get around to it. HTM.java is >>> technically a pre-release (alpha) version for this reason. >>> >>> Cheers, >>> David >>> >>> On Tue, Dec 8, 2015 at 4:59 AM, Fergal Byrne < >>> fergalbyrnedub...@gmail.com> wrote: >>> >>>> Hi Matt, >>>> >>>> As Stuart Holloway explains here [1], on the JVM, it's always GC. I can >>>> barely run a single 2048x16 HTM model on my 8Gb laptop on the hotgym hourly >>>> data - it slows to a crawl after 1000 rows because it's thrashing the GC >>>> trying to free space on the heap (setting -Xmx2800m as JVM params helps). >>>> Good luck trying to keep any more than one model per JVM up for any length >>>> of time. >>>> >>>> If you run htm-moclu in a single JVM, something somewhere will have >>>> live references leading to every Network you have loaded. So your live heap >>>> is going to be at least N models x heap per model. This is not a big >>>> problem until you start growing distal segments, which are Java objects on >>>> the heap. In HTM.java this happens in the TM, which grows as it learns. >>>> >>>> GC will detect an impending OOM condition, then will stop the world and >>>> mark all these live references, traversing your millions of objects. >>>> Finding nothing to free, the JVM will eventually fail at some unpredictable >>>> and unrelated point in the code. >>>> >>>> To check this, run this function every few rows: >>>> >>>> void mem() { >>>> int mb = 1024 * 1024; >>>> >>>> // get Runtime instance >>>> Runtime instance = Runtime.getRuntime(); >>>> >>>> //System.out.println("***** Heap utilization statistics [MB] *****\n"); >>>> >>>> // available memory >>>> System.out.println("Total: " + instance.totalMemory() / mb >>>> + "\tFree: " + instance.freeMemory() / mb >>>> + "\tUsed Memory: " + (instance.totalMemory() - >>>> instance.freeMemory()) / mb >>>> + "\tMax Memory: " + instance.maxMemory() / mb); >>>> } >>>> >>>> Regards, >>>> >>>> Fergal Byrne >>>> >>>> [1] https://youtu.be/FihU5JxmnBg?t=38m6s >>>> >>>> On Tue, Dec 8, 2015 at 9:21 AM, cogmission (David Ray) < >>>> cognitionmiss...@gmail.com> wrote: >>>> >>>>> Hey Matt, did you try ramping up from 1 model to see if it was a >>>>> capacity issue? I would be interested to see how the system responds as an >>>>> increasing number of models are added. Anyway, I can't really comment on >>>>> moclu as I don't know what's happening there and I don't have time these >>>>> days to help investigate as I am stretched a bit thin at the moment. >>>>> >>>>> @antidata if you could explain what you mean by "renders the JVM >>>>> unresponsive" it would help me possibly attend to any issue there might be >>>>> in the Network API though I never had any problems with unresponsiveness >>>>> at >>>>> all. Thanks... >>>>> >>>>> Cheers, >>>>> David >>>>> >>>>> On Mon, Dec 7, 2015 at 9:30 PM, Matthew Taylor <m...@numenta.org> >>>>> wrote: >>>>> >>>>>> David, BTW the failure in the video is a 4m: >>>>>> https://youtu.be/DnKxrd4TLT8?t=4m >>>>>> --------- >>>>>> Matt Taylor >>>>>> OS Community Flag-Bearer >>>>>> Numenta >>>>>> >>>>>> >>>>>> On Mon, Dec 7, 2015 at 7:24 PM, Matthew Taylor <m...@numenta.org> >>>>>> wrote: >>>>>> > David and Mike, >>>>>> > >>>>>> > I've moved this to another topic to discuss. >>>>>> > >>>>>> > So what I tried with moclu was to take the HTM engine traffic app >>>>>> as shown here: >>>>>> > >>>>>> > >>>>>> https://github.com/nupic-community/htmengine-traffic-tutorial/blob/master/images/HTM-Traffic-Architecture.jpg >>>>>> > >>>>>> > And I swapped out the entire green python box containing the HTM >>>>>> > Engine and replaced it with a local instance of moclu. When the >>>>>> > traffic app starts up, it creates 153 models immediately and then >>>>>> > starts pushing data into all of them at once: >>>>>> > >>>>>> > https://youtu.be/lzJd_a6y6-E?t=15m >>>>>> > >>>>>> > This caused dramatic failure in HTM Moclu, and I think that is what >>>>>> > Mike's talking about. I recorded it for Mike here: >>>>>> > https://www.youtube.com/watch?v=DnKxrd4TLT8 >>>>>> > >>>>>> > I hope that explains some things. >>>>>> > >>>>>> > --------- >>>>>> > Matt Taylor >>>>>> > OS Community Flag-Bearer >>>>>> > Numenta >>>>>> > >>>>>> > >>>>>> > On Mon, Dec 7, 2015 at 9:15 AM, cogmission (David Ray) >>>>>> > <cognitionmiss...@gmail.com> wrote: >>>>>> >>> the issue you faced is that it can't create hundreds of models >>>>>> at the >>>>>> >>> same time (like its done by the traffic example) because >>>>>> instantiate a >>>>>> >>> Network object from Htm.java is an expensive operation that turns >>>>>> the JVM >>>>>> >>> unresponsive. >>>>>> >> >>>>>> >> What is being implied here? Are you saying that instantiating >>>>>> HTM.java is >>>>>> >> anymore expensive than instantiating any other medium weight >>>>>> application? >>>>>> >> >>>>>> >> Cheers, >>>>>> >> David >>>>>> >> >>>>>> >> On Mon, Dec 7, 2015 at 11:05 AM, M.Lucchetta < >>>>>> mmlucche...@gmail.com> wrote: >>>>>> >>> >>>>>> >>> Hello Matt, folks >>>>>> >>> >>>>>> >>> You can currently use Htm-MoClu in just one computer, the issue >>>>>> you faced >>>>>> >>> is that it can't create hundreds of models at the same time (like >>>>>> its done >>>>>> >>> by the traffic example) because instantiate a Network object from >>>>>> Htm.java >>>>>> >>> is an expensive operation that turns the JVM unresponsive. >>>>>> >>> >>>>>> >>> I'm currently working on the Release Candidate (v 1.0.0) and the >>>>>> only >>>>>> >>> thing missing from your specs is: >>>>>> >>> >>>>>> >>> `allows POST of full model params` >>>>>> >>> >>>>>> >>> Will chat over Gitter to get more details on this. >>>>>> >>> >>>>>> >>> You can find an example of its usage in >>>>>> https://github.com/antidata/ATAD >>>>>> >>> it uses the Lift Web Framework (Comet Actors) to push updates to >>>>>> the browser >>>>>> >>> in real time (similar to web sockets proposition) and saves the >>>>>> requests + >>>>>> >>> results into MongoDB so you can query both the data coming from >>>>>> outside and >>>>>> >>> the data generated from HTM (anomaly score + predictions). >>>>>> >>> One last comment is that Htm-Moclu is web framework agnostic, you >>>>>> can use >>>>>> >>> any web framework that works on the JVM. >>>>>> >>> >>>>>> >>> Feel free to ping me if any of you like to contribute to this >>>>>> project. >>>>>> >>> >>>>>> >>> Thanks! >>>>>> >>> >>>>>> >>> On 7 December 2015 at 08:36, Matthew Taylor <m...@numenta.org> >>>>>> wrote: >>>>>> >>>> >>>>>> >>>> Ok folks, let's move discussion of the implementation to Github. >>>>>> First >>>>>> >>>> question to answer is which HTM implementation to use: >>>>>> >>>> https://github.com/nupic-community/htm-over-http/issues/2 >>>>>> >>>> >>>>>> >>>> Anyone else reading this is free to jump in and help out, but I >>>>>> want >>>>>> >>>> to define our work properly using Github issues so we all know >>>>>> what is >>>>>> >>>> happening and who is working on what. >>>>>> >>>> --------- >>>>>> >>>> Matt Taylor >>>>>> >>>> OS Community Flag-Bearer >>>>>> >>>> Numenta >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On Sun, Dec 6, 2015 at 10:25 PM, Jonathan Mackenzie < >>>>>> jonm...@gmail.com> >>>>>> >>>> wrote: >>>>>> >>>> > Sounds like a good app Matt, I can help out. Personally, for >>>>>> getting an >>>>>> >>>> > web >>>>>> >>>> > app off the ground quickly in python I recommend pyramid: >>>>>> >>>> > http://www.pylonsproject.org/ >>>>>> >>>> > >>>>>> >>>> > On 7 December 2015 at 03:31, Matthew Taylor <m...@numenta.org> >>>>>> wrote: >>>>>> >>>> >> >>>>>> >>>> >> Thanks for the interest! I'll try to respond to everyone in >>>>>> this >>>>>> >>>> >> email. But first, who reading this would want to use an HTM >>>>>> over HTTP >>>>>> >>>> >> service like this? It means that you won't need to have HTM >>>>>> running on >>>>>> >>>> >> the same system that is generating the data. It's basically >>>>>> HTM in the >>>>>> >>>> >> Cloud. :) >>>>>> >>>> >> >>>>>> >>>> >> On Sat, Dec 5, 2015 at 12:16 PM, Marcus Lewis < >>>>>> mrcs...@gmail.com> >>>>>> >>>> >> wrote: >>>>>> >>>> >> > I'm interested in HTTP GET, inspecting models. >>>>>> >>>> >> >>>>>> >>>> >> Great feature to add after a minimum viable product has been >>>>>> created, >>>>>> >>>> >> but this adds the complexity of either caching or persistence >>>>>> >>>> >> (depending on how much history you want). >>>>>> >>>> >> >>>>>> >>>> >> On Sat, Dec 5, 2015 at 2:03 PM, cogmission (David Ray) >>>>>> >>>> >> <cognitionmiss...@gmail.com> wrote: >>>>>> >>>> >> > One thing I am concerned about is the call/answer nature of >>>>>> the >>>>>> >>>> >> > interface >>>>>> >>>> >> > you describe because of the latency involved in a >>>>>> >>>> >> > submit-one-row-per-call >>>>>> >>>> >> > methodology? Should it not be able to "batch" process rows >>>>>> of data >>>>>> >>>> >> > instead? >>>>>> >>>> >> > (batches could contain one row if you were dedicated to >>>>>> being a >>>>>> >>>> >> > masochist)? >>>>>> >>>> >> >>>>>> >>>> >> Yes, we will eventually need that, but I don't need it in the >>>>>> >>>> >> prototype. Let's focus on one row at a time and expand to >>>>>> batching >>>>>> >>>> >> later. >>>>>> >>>> >> >>>>>> >>>> >> > Next, at Cortical we use a technology called DropWizard >>>>>> which makes >>>>>> >>>> >> > it >>>>>> >>>> >> > very >>>>>> >>>> >> > easy to deploy an HTTP server capable of Restful queries (I >>>>>> have >>>>>> >>>> >> > done >>>>>> >>>> >> > this >>>>>> >>>> >> > for Twitter processing involving HTM.java). >>>>>> >>>> >> >>>>>> >>>> >> If this is going to use NuPIC and python, I have found that >>>>>> it's super >>>>>> >>>> >> easy to set up REST with web.py [1]. Just a matter for >>>>>> writing a class >>>>>> >>>> >> and a few functions. For REST on the JVM, I am open for >>>>>> suggestions. >>>>>> >>>> >> >>>>>> >>>> >> On Sat, Dec 5, 2015 at 5:50 PM, Pascal Weinberger >>>>>> >>>> >> <passiweinber...@gmail.com> wrote: >>>>>> >>>> >> > Like a extended version of HTM engine? >>>>>> >>>> >> > This would be the solution to the htmengine prediction >>>>>> issue :) >>>>>> >>>> >> >>>>>> >>>> >> If we chose the HTM Engine option, then yes we would need to >>>>>> add some >>>>>> >>>> >> features to HTM Engine, especially prediction and >>>>>> user-defined model >>>>>> >>>> >> params. This is not a little job, but it would be great to >>>>>> have a >>>>>> >>>> >> scaling platform already built into the HTTP server. I would >>>>>> be happy >>>>>> >>>> >> even if we just started with an attempt to make HTM Engine >>>>>> (and the >>>>>> >>>> >> HTTP server in the skeleton app) deployable to a the cloud. >>>>>> Even with >>>>>> >>>> >> it's current capabilities, I could start using it immediately >>>>>> and we >>>>>> >>>> >> could add features over time. >>>>>> >>>> >> >>>>>> >>>> >> > Will you set up a repo in the community? :) >>>>>> >>>> >> >>>>>> >>>> >> Placeholder: https://github.com/nupic-community/htm-over-http >>>>>> >>>> >> >>>>>> >>>> >> Let's continue discussion on Gitter [2]. Our first decision >>>>>> is to >>>>>> >>>> >> decide which HTM implementation to use. I am leaning towards >>>>>> HTM >>>>>> >>>> >> Engine because it would take the smallest amount of effort to >>>>>> do the >>>>>> >>>> >> deployment configuration around it and get an MVP running the >>>>>> fastest >>>>>> >>>> >> (even if it doesn't to prediction or custom model params out >>>>>> of the >>>>>> >>>> >> box). >>>>>> >>>> >> >>>>>> >>>> >> IMO the best way to attack this is to get something minimal >>>>>> running >>>>>> >>>> >> ASAP and add features as required. >>>>>> >>>> >> >>>>>> >>>> >> [1] http://webpy.org/ >>>>>> >>>> >> [2] https://gitter.im/nupic-community/htm-over-http >>>>>> >>>> >> --------- >>>>>> >>>> >> Matt Taylor >>>>>> >>>> >> OS Community Flag-Bearer >>>>>> >>>> >> Numenta >>>>>> >>>> >> >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > -- >>>>>> >>>> > Jonathan Mackenzie >>>>>> >>>> > BEng (Software) Hons >>>>>> >>>> > PhD Candidate, Flinders University >>>>>> >>>> >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> With kind regards, >>>>>> >> >>>>>> >> David Ray >>>>>> >> Java Solutions Architect >>>>>> >> >>>>>> >> Cortical.io >>>>>> >> Sponsor of: HTM.java >>>>>> >> >>>>>> >> d....@cortical.io >>>>>> >> http://cortical.io >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *With kind regards,* >>>>> >>>>> David Ray >>>>> Java Solutions Architect >>>>> >>>>> *Cortical.io <http://cortical.io/>* >>>>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>>>> >>>>> d....@cortical.io >>>>> http://cortical.io >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Fergal Byrne, Brenter IT @fergbyrne >>>> >>>> http://inbits.com - Better Living through Thoughtful Technology >>>> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >>>> >>>> Founder of Clortex: HTM in Clojure - >>>> https://github.com/nupic-community/clortex >>>> Co-creator @OccupyStartups Time-Bombed Open License >>>> http://occupystartups.me >>>> >>>> Author, Real Machine Intelligence with Clortex and NuPIC >>>> Read for free or buy the book at https://leanpub.com/realsmartmachines >>>> >>>> e:fergalbyrnedub...@gmail.com t:+353 83 4214179 >>>> Join the quest for Machine Intelligence at http://numenta.org >>>> Formerly of Adnet edi...@adnet.ie http://www.adnet.ie >>>> >>> >>> >>> >>> -- >>> *With kind regards,* >>> >>> David Ray >>> Java Solutions Architect >>> >>> *Cortical.io <http://cortical.io/>* >>> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >>> >>> d....@cortical.io >>> http://cortical.io >>> >> >> >> >> -- >> >> Fergal Byrne, Brenter IT @fergbyrne >> >> http://inbits.com - Better Living through Thoughtful Technology >> http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne >> >> Founder of Clortex: HTM in Clojure - >> https://github.com/nupic-community/clortex >> Co-creator @OccupyStartups Time-Bombed Open License >> http://occupystartups.me >> >> Author, Real Machine Intelligence with Clortex and NuPIC >> Read for free or buy the book at https://leanpub.com/realsmartmachines >> >> e:fergalbyrnedub...@gmail.com t:+353 83 4214179 >> Join the quest for Machine Intelligence at http://numenta.org >> Formerly of Adnet edi...@adnet.ie http://www.adnet.ie >> > > -- Fergal Byrne, Brenter IT @fergbyrne http://inbits.com - Better Living through Thoughtful Technology http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne Founder of Clortex: HTM in Clojure - https://github.com/nupic-community/clortex Co-creator @OccupyStartups Time-Bombed Open License http://occupystartups.me Author, Real Machine Intelligence with Clortex and NuPIC Read for free or buy the book at https://leanpub.com/realsmartmachines e:fergalbyrnedub...@gmail.com t:+353 83 4214179 Join the quest for Machine Intelligence at http://numenta.org Formerly of Adnet edi...@adnet.ie http://www.adnet.ie