Wow, the mailing list has really exploded the past few days! A bunch of us were very busy at AWS Invent this week and personally I have only been able to sample the emails.
There has been a bunch of good discussion about the NuPIC API. I agree with Matt, Fergal, Doug, and others about the need for defining this. There is also some (understandable) confusion about the various components of NuPIC, so I wanted to put some of this down and clarify the language. There are three main areas of NuPIC: the Algorithms, the Network/Regions API, and the OPF. These three have very distinct purposes. Below is a detailed description of each component. I apologize for the wordiness, but hopefully this is helpful in understanding the various pieces. I also have below a proposal for the “NuPIC API”. The concepts underlying the CLA are extremely powerful. The potential applications are vast but there is still much research left to do. It is not easy to have a one size fits all approach, and we will be stretched in many different directions. A discussion of the API, release process, etc. is great to have now - thanks to Matt for kicking this off. —Subutai *Algorithms*. This level contains implementations of the Spatial and Temporal Poolers. If you just want to work with the raw algorithms, this is the easiest level to use. The file spatial_pooler.py contains a clean implementation of the spatial pooler that can be used directly. See hello_tp.py for how to use the temporal pooler directly. Matt used this for his nupic_nlp implementation. *Network API* - This level formalizes the concept of “Networks" and “Regions". This API allows you to string together multiple regions, including hierarchies. You can send the output of N Regions into higher level Regions. It formalizes initialization, input/output vectors, a unified mechanism for setting/getting parameters, serialization, and the order that compute is called on individual regions (this is very important for hierarchies). It is one level above the algorithms and is agnostic to the specific algorithm. For example, a Region can have a CLA implementation or a KNN implementation. It is 100% written in C++, and it is very small and clean. It can support multiple language bindings, with Python being the main one currently implemented. *OPF* - the Online Prediction Framework is a client of the Network API and used in Grok’s commercial product. It is designed for a very specific use case: small streaming data applications. The OPF contains three years of exploring dozens of different industries and business models while we searched for commercial applications of the CLA. At least 40% of this code is not used anywhere anymore. We haven’t had time to clean it up. I would characterize this code as very powerful, very messy, and not easy to understand. The OPF includes encoders, the classifier, swarming, and the description file format. All this stuff is pretty specific to streaming data applications such as energy, IT data, etc. It does not support hierarchies, vision, and so on. The hotgym sample is an example of using the OPF. *Support* - I include this because there is also a grab bag of other components that are used in various places. This would include the Sparse math library, test routines, the build system, etc. So what is the NuPIC API? *My strawman proposal: the main API for NuPIC should be the Network/Regions API*. It is very generic and clean. It is independent of specific algorithm implementations, and can support a very wide range of use cases including streaming data, vision, audio, hierarchies. It can also easily support other languages. It can support experimentation and commercial uses. Doug: the Network API does not yet support distributing CLA Regions across servers, but it could be extended to do so. It does support the idea of having pre-trained Regions that you can hook together.
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
