[nupic-discuss] HTM parallelization/scaling questions

Alex Davos Sun, 22 Jun 2014 13:16:24 -0700

Hello,


We are a team of astrophysicists, interested in applying HTM/CLA to pattern
recognition problems in our datasets, mostly time series of high energy
physics and other related phenomena.  These datasets posses characteristics
that seem to fit well to HTM's capabilities so we thought we should give it
a shot.


However, because of the size and complexity of the datasets, the current
size of HTM/CLA is not adequate to capture the structure available. We
intent to attempt parallelizing the code in order to increase the
capabilities of the model as much as possible so we have a couple of
questions before we attempt to do so:


First of all, most of the references to the size of HTM/CLA up to this
point have been limited up to 2048 columns. Jeff Hawkins has mentioned in a
presentation that he could probably do "3X" that or so. However we haven't
been able to locate the exact specification of the hardware used in either
of these cases.


To put it differently, what is exactly the hardware in which the 3X part
was determined? Is it a run of the mill I7 with 32GB of RAM or something a
bit more powerful?


Is this a strictly serial implementation?


What are the main bottlenecks at the moment? Our experience with CLA so far
points to a CPU-RAM bus bottleneck rather than computational or RAM size
limits, do your observations match ours?



For example we have available to us in the lab about 10 nodes (each has 8
socket motherboard with 10 core xeons per socket and 512 GB RAM per cpu,
4TB RAM per node) connected to each other with infiniband and high speed
switches, can you provide us with a ballpark on what is achievable with
that hardware?


Second, what is preferable, from a theoretical standpoint at least, as a
parallelization strategy, a column level one or a hierarchical? To clarify,
is it better that we attempt to built a few big regions containing many
many columns each, or we should instead create many smaller regions and
connect each other hierarchically?


What are the pros/cons of each approach and what datasets would benefit
more from bigger regions rather than many smaller and vice versa.



Thank you in advance for taking the time to answer our questions.

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

[nupic-discuss] HTM parallelization/scaling questions

Reply via email to