Re: [nupic-discuss] Encoder Questions

Fergal Byrne Sat, 02 Aug 2014 08:45:26 -0700

Hi Nicholas,

They're some really good questions.

On Sat, Aug 2, 2014 at 1:50 PM, Nicholas Mitri <[email protected]> wrote:

> 1- Are there any specific properties that encoders need to have when
> designing one? What’s the rationale behind them if they exist?
>

Yes, there are a couple of important properties which encodings must have.

The most important one is that if you have a meaning for "semantic
closeness" (or "distance") in the data, then close values should have
overlapping bits in their encodings and distant values should not.

An example is for scalar values (which may be arbitrarily close, of
course), where you choose the encoding so that values within some range (or
radius) r have the same encoding, those more than r and less than 2r differ
by a single bit, and so on. The "traditional" scalar encoder produces
encodings such as:

    "111100000000"
    "011110000000"
    "001111000000"
    "000111100000"
    "000111100000"
    "000011110000"
    "000001111000"
    "000000111100"
    "000000111100"
    "000000011110"
    "000000001111"
    "000000001111"

I've done a discussion of this and Chetan's newer Random Distributed Scalar
Encoder if you want more detail [1].

Sometimes you have data which does not have such distance semantics. An
example is categorical data, where values are either members of a set or
not, the sets are disjoint and there is no ordering semantics (this would
be common but not applicable to all categorical data). In this case you
could either divide the encoding width into n blocks and assign a block to
each category, or choose "random" encodings for each category.

The rationale behind this is that most columns which activate on an input
will also do so on "nearby" inputs, since they subsample the bits, and so
the SDR on the layer will vary little when the inputs change a little. This
provides stability in the face of noise, and allows the CLA to form a
stable representation of the inputs.

The other primary property is sparseness, which I'll explain in more detail
in response to the next question.

>  2- The wiki refers to encoder outputs as SDRs. Is that necessarily the
> case and if so, to what properties of encoder design is that requirement
> attributed to? (i.e. why do I need an SDR to be the output of the encoder
> as opposed to a binary vector unconstrained in density?)
>

The sparseness is necessary to take advantage of a) the improbability of
two substantially overlapping encodings (when subsampled) being a false
match, and b) false matches representing mild semantic errors. This is a
statistical property of sparse representations, and it's used in techniques
such as locality sensitive hashing [2]. Essentially, sparse binary vectors
with most bits differing are very far apart in the high-dimensional space
compared with those which share many bits.

Inter-layer communication uses SDRs of course, so genuine SDRs (at ~2%
on-bits) are the "best" encodings of data, but the CLA will work fine with
only "quite sparse" inputs of the order of 10-15% on bits. CLA will learn
faster if the encoding is less sparse, and the number of on-bits relates to
discriminatory resolution, so we'll often (or even usually) use this
less-sparse encoding regime.

> 3- Is there a biological counterpart for encoders in the general sense?
>

Yes, all input to the neocortex is composed of trains of spikes, which is a
digital encoding scheme. The brain truly receives streams of bits and
generates the illusion that we "see" or "hear" directly.

> 4- Encoders perform quantization on the input stream by binning similar
> input patterns into hypercubes in feature space and assigning a single
> label (SDR or binary representation) to each bin. The encoder resolution
> determines the size of the hypercube. The SP essentially performs a very
> similar task by binning the outputs of the encoder in a binary feature
> space instead. City block distance determined by a threshold parameter
> controls the size of the hypercubes/bins. Why is this not viewed as a
> redundant operation by 2 consecutive modules of the HTM design? Is there a
> strong case for allowing for it?
>

That's a very good question. There are a few parts to the answer.

Firstly, independently encoded inputs are often fed into a HTM system which
will extract correlative or causal structure between or among the inputs
(this is how Layer 4 combines sensory and motor data in the recent version
of Jeff's theory).

Secondly, a HTM hierarchy will extract a hierarchy of feature structure by
repeating the same algorithm at each level (and this hierarchy cannot be
represented in a single encoding).

Thirdly, HTM will extract temporal structure from a series of
"independently" encoded inputs, which again cannot be represented in each
single encoding.

Fourthly, the sparseness of the output of each layer in HTM is a property
of that layer, and independent of the input sparseness, so there is a
sparseness transformation which alters the dimensionality of the output
compared with the input.

You need to view HTM as a system which extracts structure which is latent
in the encoded data; if your encoding is so clever that it exposes all this
structure directly, then you're right, you don't need HTM at all!

> 5- Finally, is there any benefit to designing an encoding scheme that bins
> inputs into hyperspheres instead of hypercubes? Would the resulting
> combination of bins produce decision boundaries that might possibly allow
> for better binary classification performance for example?
>

The noise levels in the brain are of the order of a bit per bit.
Implementations of HTM use drastic simplifications (such as binary
encodings, binary synapses, global inhibition, etc) and distributed
representations to model this, and so the answer is "probably", but it
seems to make little engineering sense unless hyperspheres are easier to
implement or have some other cost advantage.

Thanks again for the great questions.

Regards

Fergal Byrne

[1] http://fergalbyrne.github.io/rdse.html
[2] http://en.wikipedia.org/wiki/Locality-sensitive_hashing

-- 

Fergal Byrne, Brenter IT

Author, Real Machine Intelligence with Clortex and NuPIC
https://leanpub.com/realsmartmachines

Speaking on Clortex and HTM/CLA at euroClojure Krakow, June 2014:
http://euroclojure.com/2014/
and at LambdaJam Chicago, July 2014: http://www.lambdajam.com

http://inbits.com - Better Living through Thoughtful Technology
http://ie.linkedin.com/in/fergbyrne/ - https://github.com/fergalbyrne

e:[email protected] t:+353 83 4214179
Join the quest for Machine Intelligence at http://numenta.org
Formerly of Adnet [email protected] http://www.adnet.ie

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Encoder Questions

Reply via email to