Wow, the mailing list has really exploded the past few days! A bunch of us
were very busy at AWS Invent this week and personally I have only been able
to sample the emails.

There has been a bunch of good discussion about the NuPIC API. I agree with
Matt, Fergal, Doug, and others about the need for defining this. There is
also some (understandable) confusion about the various components of NuPIC,
so I wanted to put some of this down and clarify the language.

There are three main areas of NuPIC: the Algorithms, the Network/Regions
API, and the OPF.   These three have very distinct purposes.  Below is a
detailed description of each component. I apologize for the wordiness, but
hopefully this is helpful in understanding the various pieces.   I also
have below a proposal for the “NuPIC API”.

The concepts underlying the CLA are extremely powerful. The potential
applications are vast but there is still much research left to do. It is
not easy to have a one size fits all approach, and we will be stretched in
many different directions. A discussion of the API, release process, etc.
is great to have now - thanks to Matt for kicking this off.

—Subutai

*Algorithms*. This level contains implementations of the Spatial and
Temporal Poolers. If you just want to work with the raw algorithms, this is
the easiest level to use.  The file spatial_pooler.py contains a clean
implementation of the spatial pooler that can be used directly. See
hello_tp.py for how to use the temporal pooler directly. Matt used this for
his nupic_nlp implementation.

*Network API* - This level formalizes the concept of “Networks" and
“Regions". This API allows you to string together multiple regions,
including hierarchies. You can send the output of N Regions into higher
level Regions. It formalizes initialization, input/output vectors, a
unified mechanism for setting/getting parameters, serialization, and the
order that compute is called on individual regions (this is very important
for hierarchies). It is one level above the algorithms and is agnostic to
the specific algorithm. For example, a Region can have a CLA implementation
or a KNN implementation. It is 100% written in C++, and it is very small
and clean. It can support multiple language bindings, with Python being the
main one currently implemented.

*OPF* - the Online Prediction Framework is a client of the Network API and
used in Grok’s commercial product. It is designed for a very specific use
case: small streaming data applications. The OPF contains three years of
exploring dozens of different industries and business models while we
searched for commercial applications of the CLA. At least 40% of this code
is not used anywhere anymore. We haven’t had time to clean it up. I would
characterize this code as very powerful, very messy, and not easy to
understand. The OPF includes encoders, the classifier, swarming, and the
description file format. All this stuff is pretty specific to streaming
data applications such as energy, IT data, etc. It does not support
hierarchies, vision, and so on.  The hotgym sample is an example of using
the OPF.

*Support* - I include this because there is also a grab bag of other
components that are used in various places. This would include the Sparse
math library, test routines, the build system, etc.

So what is the NuPIC API? *My strawman proposal: the main API for NuPIC
should be the Network/Regions API*. It is very generic and clean. It is
independent of specific algorithm implementations, and can support a very
wide range of use cases including streaming data, vision, audio,
hierarchies. It can also easily support other languages. It can support
experimentation and commercial uses.

Doug: the Network API does not yet support distributing CLA Regions across
servers, but it could be extended to do so. It does support the idea of
having pre-trained Regions that you can hook together.
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to