Hi, With regard to the following question:
2- The wiki refers to encoder outputs as SDRs. Is that necessarily the case > and if so, to what properties of encoder design is that requirement > attributed to? (i.e. why do I need an SDR to be the output of the encoder > as opposed to a binary vector unconstrained in density?) ...I happened upon another not often cited advantageous property of SDR's which is their Quantum-like simultaneity when considering the efficiency of searching over a vast store of representations - by taking the union of all candidates, one can yield instantaneously the matching semantic characteristics relevant to a given search parameter. Here is an interesting link which parallels Jeff's thinking on this topic: http://people.brandeis.edu/~grinkus/SDR_and_QC.html David Ray On Sat, Aug 2, 2014 at 10:44 AM, Fergal Byrne <[email protected]> wrote: > > Hi Nicholas, > > They're some really good questions. > > On Sat, Aug 2, 2014 at 1:50 PM, Nicholas Mitri <[email protected]> > wrote: > >> 1- Are there any specific properties that encoders need to have when >> designing one? What’s the rationale behind them if they exist? >> > > Yes, there are a couple of important properties which encodings must have. > > The most important one is that if you have a meaning for "semantic > closeness" (or "distance") in the data, then close values should have > overlapping bits in their encodings and distant values should not. > > An example is for scalar values (which may be arbitrarily close, of > course), where you choose the encoding so that values within some range (or > radius) r have the same encoding, those more than r and less than 2r differ > by a single bit, and so on. The "traditional" scalar encoder produces > encodings such as: > > "111100000000" > "011110000000" > "001111000000" > "000111100000" > "000111100000" > "000011110000" > "000001111000" > "000000111100" > "000000111100" > "000000011110" > "000000001111" > "000000001111" > > I've done a discussion of this and Chetan's newer Random Distributed > Scalar Encoder if you want more detail [1]. > > Sometimes you have data which does not have such distance semantics. An > example is categorical data, where values are either members of a set or > not, the sets are disjoint and there is no ordering semantics (this would > be common but not applicable to all categorical data). In this case you > could either divide the encoding width into n blocks and assign a block to > each category, or choose "random" encodings for each category. > > The rationale behind this is that most columns which activate on an input > will also do so on "nearby" inputs, since they subsample the bits, and so > the SDR on the layer will vary little when the inputs change a little. This > provides stability in the face of noise, and allows the CLA to form a > stable representation of the inputs. > > The other primary property is sparseness, which I'll explain in more > detail in response to the next question. > > >> 2- The wiki refers to encoder outputs as SDRs. Is that necessarily the >> case and if so, to what properties of encoder design is that requirement >> attributed to? (i.e. why do I need an SDR to be the output of the encoder >> as opposed to a binary vector unconstrained in density?) >> > > The sparseness is necessary to take advantage of a) the improbability of > two substantially overlapping encodings (when subsampled) being a false > match, and b) false matches representing mild semantic errors. This is a > statistical property of sparse representations, and it's used in techniques > such as locality sensitive hashing [2]. Essentially, sparse binary vectors > with most bits differing are very far apart in the high-dimensional space > compared with those which share many bits. > > Inter-layer communication uses SDRs of course, so genuine SDRs (at ~2% > on-bits) are the "best" encodings of data, but the CLA will work fine with > only "quite sparse" inputs of the order of 10-15% on bits. CLA will learn > faster if the encoding is less sparse, and the number of on-bits relates to > discriminatory resolution, so we'll often (or even usually) use this > less-sparse encoding regime. > > > >> 3- Is there a biological counterpart for encoders in the general sense? >> > > Yes, all input to the neocortex is composed of trains of spikes, which is > a digital encoding scheme. The brain truly receives streams of bits and > generates the illusion that we "see" or "hear" directly. > > >> 4- Encoders perform quantization on the input stream by binning similar >> input patterns into hypercubes in feature space and assigning a single >> label (SDR or binary representation) to each bin. The encoder resolution >> determines the size of the hypercube. The SP essentially performs a very >> similar task by binning the outputs of the encoder in a binary feature >> space instead. City block distance determined by a threshold parameter >> controls the size of the hypercubes/bins. Why is this not viewed as a >> redundant operation by 2 consecutive modules of the HTM design? Is there a >> strong case for allowing for it? >> > > That's a very good question. There are a few parts to the answer. > > Firstly, independently encoded inputs are often fed into a HTM system > which will extract correlative or causal structure between or among the > inputs (this is how Layer 4 combines sensory and motor data in the recent > version of Jeff's theory). > > Secondly, a HTM hierarchy will extract a hierarchy of feature structure by > repeating the same algorithm at each level (and this hierarchy cannot be > represented in a single encoding). > > Thirdly, HTM will extract temporal structure from a series of > "independently" encoded inputs, which again cannot be represented in each > single encoding. > > Fourthly, the sparseness of the output of each layer in HTM is a property > of that layer, and independent of the input sparseness, so there is a > sparseness transformation which alters the dimensionality of the output > compared with the input. > > You need to view HTM as a system which extracts structure which is latent > in the encoded data; if your encoding is so clever that it exposes all this > structure directly, then you're right, you don't need HTM at all! > > >> 5- Finally, is there any benefit to designing an encoding scheme that >> bins inputs into hyperspheres instead of hypercubes? Would the resulting >> combination of bins produce decision boundaries that might possibly allow >> for better binary classification performance for example? >> > > The noise levels in the brain are of the order of a bit per bit. > Implementations of HTM use drastic simplifications (such as binary > encodings, binary synapses, global inhibition, etc) and distributed > representations to model this, and so the answer is "probably", but it > seems to make little engineering sense unless hyperspheres are easier to > implement or have some other cost advantage. > > Thanks again for the great questions. > > Regards > > Fergal Byrne > > [1] http://fergalbyrne.github.io/rdse.html > [2] http://en.wikipedia.org/wiki/Locality-sensitive_hashing > > > -- > > Fergal Byrne, Brenter IT > > Author, Real Machine Intelligence with Clortex and NuPIC > https://leanpub.com/realsmartmachines > > Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014: > http://euroclojure.com/2014/ > and at LambdaJam Chicago, July 2014: http://www.lambdajam.com > > http://inbits.com - Better Living through Thoughtful Technology > http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne > > e:[email protected] t:+353 83 4214179 > Join the quest for Machine Intelligence at http://numenta.org > Formerly of Adnet [email protected] http://www.adnet.ie > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
