Hi Nicholas, They're some really good questions.
On Sat, Aug 2, 2014 at 1:50 PM, Nicholas Mitri <[email protected]> wrote: > 1- Are there any specific properties that encoders need to have when > designing one? What’s the rationale behind them if they exist? > Yes, there are a couple of important properties which encodings must have. The most important one is that if you have a meaning for "semantic closeness" (or "distance") in the data, then close values should have overlapping bits in their encodings and distant values should not. An example is for scalar values (which may be arbitrarily close, of course), where you choose the encoding so that values within some range (or radius) r have the same encoding, those more than r and less than 2r differ by a single bit, and so on. The "traditional" scalar encoder produces encodings such as: "111100000000" "011110000000" "001111000000" "000111100000" "000111100000" "000011110000" "000001111000" "000000111100" "000000111100" "000000011110" "000000001111" "000000001111" I've done a discussion of this and Chetan's newer Random Distributed Scalar Encoder if you want more detail [1]. Sometimes you have data which does not have such distance semantics. An example is categorical data, where values are either members of a set or not, the sets are disjoint and there is no ordering semantics (this would be common but not applicable to all categorical data). In this case you could either divide the encoding width into n blocks and assign a block to each category, or choose "random" encodings for each category. The rationale behind this is that most columns which activate on an input will also do so on "nearby" inputs, since they subsample the bits, and so the SDR on the layer will vary little when the inputs change a little. This provides stability in the face of noise, and allows the CLA to form a stable representation of the inputs. The other primary property is sparseness, which I'll explain in more detail in response to the next question. > 2- The wiki refers to encoder outputs as SDRs. Is that necessarily the > case and if so, to what properties of encoder design is that requirement > attributed to? (i.e. why do I need an SDR to be the output of the encoder > as opposed to a binary vector unconstrained in density?) > The sparseness is necessary to take advantage of a) the improbability of two substantially overlapping encodings (when subsampled) being a false match, and b) false matches representing mild semantic errors. This is a statistical property of sparse representations, and it's used in techniques such as locality sensitive hashing [2]. Essentially, sparse binary vectors with most bits differing are very far apart in the high-dimensional space compared with those which share many bits. Inter-layer communication uses SDRs of course, so genuine SDRs (at ~2% on-bits) are the "best" encodings of data, but the CLA will work fine with only "quite sparse" inputs of the order of 10-15% on bits. CLA will learn faster if the encoding is less sparse, and the number of on-bits relates to discriminatory resolution, so we'll often (or even usually) use this less-sparse encoding regime. > 3- Is there a biological counterpart for encoders in the general sense? > Yes, all input to the neocortex is composed of trains of spikes, which is a digital encoding scheme. The brain truly receives streams of bits and generates the illusion that we "see" or "hear" directly. > 4- Encoders perform quantization on the input stream by binning similar > input patterns into hypercubes in feature space and assigning a single > label (SDR or binary representation) to each bin. The encoder resolution > determines the size of the hypercube. The SP essentially performs a very > similar task by binning the outputs of the encoder in a binary feature > space instead. City block distance determined by a threshold parameter > controls the size of the hypercubes/bins. Why is this not viewed as a > redundant operation by 2 consecutive modules of the HTM design? Is there a > strong case for allowing for it? > That's a very good question. There are a few parts to the answer. Firstly, independently encoded inputs are often fed into a HTM system which will extract correlative or causal structure between or among the inputs (this is how Layer 4 combines sensory and motor data in the recent version of Jeff's theory). Secondly, a HTM hierarchy will extract a hierarchy of feature structure by repeating the same algorithm at each level (and this hierarchy cannot be represented in a single encoding). Thirdly, HTM will extract temporal structure from a series of "independently" encoded inputs, which again cannot be represented in each single encoding. Fourthly, the sparseness of the output of each layer in HTM is a property of that layer, and independent of the input sparseness, so there is a sparseness transformation which alters the dimensionality of the output compared with the input. You need to view HTM as a system which extracts structure which is latent in the encoded data; if your encoding is so clever that it exposes all this structure directly, then you're right, you don't need HTM at all! > 5- Finally, is there any benefit to designing an encoding scheme that bins > inputs into hyperspheres instead of hypercubes? Would the resulting > combination of bins produce decision boundaries that might possibly allow > for better binary classification performance for example? > The noise levels in the brain are of the order of a bit per bit. Implementations of HTM use drastic simplifications (such as binary encodings, binary synapses, global inhibition, etc) and distributed representations to model this, and so the answer is "probably", but it seems to make little engineering sense unless hyperspheres are easier to implement or have some other cost advantage. Thanks again for the great questions. Regards Fergal Byrne [1] http://fergalbyrne.github.io/rdse.html [2] http://en.wikipedia.org/wiki/Locality-sensitive_hashing -- Fergal Byrne, Brenter IT Author, Real Machine Intelligence with Clortex and NuPIC https://leanpub.com/realsmartmachines Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014: http://euroclojure.com/2014/ and at LambdaJam Chicago, July 2014: http://www.lambdajam.com http://inbits.com - Better Living through Thoughtful Technology http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne e:[email protected] t:+353 83 4214179 Join the quest for Machine Intelligence at http://numenta.org Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
