Hi Brian, Thanks! This has been (its not over yet) both a labor of love and a severe push, largely for some of the same reasons you pointed out. This is my first foray into Python, and so I had all the same feelings of disorientation and intimidation. I (as well as Numenta's flag bearer, Matt Taylor) also could foresee as you do the enormous opportunity to introduce HTM theory to a significantly large group of developers - by providing a Java version.
I have to admit, most of my effort and focus have been on getting a thorough and tested Java version of NuPIC up and running as quick as possible. As a result, the only comparisons I have made are to the Python version. As one would expect, the Java version is magnitudes faster than the Python version (which mostly exists as a research platform and as a knowledge transfer platform for new users due to the ease with which new ideas can be implemented in a quick fashion). I have not (yet) had a chance to make any comparisons between the Java and C++ versions - however it is my goal to make sure the Java version is at least competitive with the C++ version (if not exceeding it - as it could very well do in a long running, primed JVM). The emphasis however is to augment the utility of NuPIC in general, and introduce as many people as possible to these technologies because they are a very unique and important contribution the field of machine learning - which is why I'm doing this! Regards, David On Sun, Oct 19, 2014 at 11:31 AM, Brian Eppert <[email protected]> wrote: > Very impressive, must have taken a lot determination, nice work! > > It’s great to see the java port is more strongly typed, one of the > scariest parts for me looking at the python code was the wealth > configuration parameters as (mis-typable, unconstrained) strings and > arrays. It seems more surmountable as an neophyte to use an IDE that can > compile and flag bad values, and provide code completions, in place > documentation or “go to definition” capabilities. > > Another win is having this in Java allows for native use by the other JVM > hosted languages like Groovy, Scala, Clojure, JRuby, etc. That’s accessible > to quite a few more developers, and with with Java’s strong > cross-platform-ness a ton of avenues of use open up. > > That is all wonderful but I’m bracing myself as I ask this but what have > you seen as far as performance as compared to the NuPIC python and C++ code? > > > On Oct 18, 2014, at 10:37 AM, cogmission1 . <[email protected]> > wrote: > > Hi Everybody, > > After 2 (looooooooong) months we finally have usable NuPIC functionality > in Java! > > Repo: https://github.com/numenta/htm.java > Wiki: https://github.com/numenta/htm.java/wiki > Twitter: https://twitter.com/search?q=%23HtmJavaDevUpdates&src=typd > > Here's a blurb describing the goals, and future plans for the project: > > ====== > > Throughout the development of the TemporalMemory and the SpatialPooler, > there was an emphasis on keeping a 1-to-1 correlation between the methods > and functions implementing each algorithm in the Java and Python versions. > To this end, I would say that 98% of the Python tests in each module have > the *exact* same output produced within the Java unit tests and integration > tests. The only place where they differ is in places where calls to an > underlying RandomNumberGenerator have a significant impact - however, even > in those places, every other aspect of the code output is carefully > monitored to ensure that had certain initial parameters been the same, the > two versions (Python and Java) would produce the exact same output. This > was achieved by altering the Python tests temporarily to be initialized > with the same values that the Java version was initialized with - and > making sure the output produced was the same! > > Additionally, a utility object (ArrayUtils) was created to bridge the gap > between functionality native to Python which doesn't exist in Java and > there was the creation of the SparseMatrix (and its subclasses: > SparseBinaryMatrix, and SparseObjectMatrix) to handle array shaping and > vector math operations. > > There are a few architectural differences in the Java version. One is the > abstraction of objects represented in the Python version as arrays and > array containers into formal Objects in the Java version. Another is that > all methods in the Java version are "functional" in that the data they > operate on is passed in, and no state is kept in either the TemporalMemory > or the SpatialPooler classes. The "Connections" class (inspired by Chetan's > Connections object) acts like an isolated memory - containing all state. > This means that two distinct Connections objects (memories) could be passed > to the TM or SP, manipulating two entirely different layers *concurrently* > or in parallel. > > > Roadmap: > > At this point the SpatialPooler can be connected to the TemporalMemory to > produce output > within a given Java project - since those two classes represent the major > inference functionality of NuPIC. However, in order to exactly reproduce > the convenience of the Online Prediction Framework, other structures would > need to be implemented - and so those are next on the list to be > implemented. The anticipated roadmap is as follows: > > 1.) Create the BaseEncoder and derivative encoders which are currently > relevant (since one or two may have become obsolete). The culmination of > which should be the GEOSpatialEncoder I assume. > > 2.) Classifiers will then be next on the list which will complete the > current hierarchy of functionality. > > 3.) Following this, Layer and Regional constructs will be created to > coordinate and manage data flow in this hierarchy. > > 4.) Then we'll loop around and take a look at what "Research" sensorymotor > based new development can be formally pulled in and guide the reshaping of > the Java version to a form that reflects the most current theory. > > 5.) Then we'll do an optimization/performance pass over the entire > codebase to make it at least as fast as whatever C++ version is available. > (*wink*) > > > > > -- > *We find it hard to hear what another is saying because of how loudly "who > one is", speaks...* > > > -- *We find it hard to hear what another is saying because of how loudly "who one is", speaks...*
