WHAT PORTION OF CORTICAL PROCESSES ARE BOUND BY "THE BINDING PROBLEM"?

 

Here is an important practical, conceptual problem I am having trouble with.


 

In an article entitled "Are Cortical Models Really Bound by the 'Binding
Problem'? " Tomaso Poggio's group at MIT takes the position that there is no
need for special mechanisms to deal with the famous "binding problem" --- at
least in certain contexts, such as 150 msec feed forward visual object
recognition.  This article implies that a properly designed hierarchy of
patterns that has both compositional and max-pooling layers (I call them
"gen/comp hierarchies") automatically handles the problem of what
sub-elements are connected with which others, preventing the need for
techniques like synchrony to handle this problem.

 

Poggio's group has achieved impressive results without the need for special
mechanisms to deal with binding in this type of visual recognition, as is
indicated by the two papers below by Serre (the later of which summarizes
much of what is in the first, which is an excellent, detailed PhD thesis.)  

 

The two  works by Geoffrey Hinton cited below are descriptions of Hinton's
hierarchical feed-forward neural net recognition system (which, when run
backwards, generates patterns similar to those it has been trained on).
These two works by Hinton show impressive results in handwritten digit
recognition without any explicit mechanism for binding.  In particular,
watch the portion of the Hinton YouTube video starting at 21:35 - 26:39
where Hinton shows his system alternating between recognizing a pattern and
then generating a similar pattern stochastically from the higher level
activations that have resulted from the previous recognition.  See how
amazingly well his system seems to capture the many varied forms in which
the various parts and sub-shapes of numerical handwritten digits are
related.

 

So my question is this: HOW BROADLY DOES THE IMPLICATION THAT THE BINDING
PROBLEM CAN BE AUTOMATICALLY HANDLED BY A GEN/COMP HIERARCHY OR A
HINTON-LIKE HIERARCHY APPLY TO THE MANY TYPES OF PROBLEMS A BRAIN LEVEL
ARTIFICIAL GENERAL INTELLIGENCE WOULD BE EXPECTED TO HANDLE?  In particular
HOW APPLICABLE IS IT TO SEMANTIC PATTERN RECOGNITION AND GENERATION --- WITH
ITS COMPLEX AND HIGHLY VARIED RELATIONS --- SUCH AS IS COMMONLY INVOLVED IN
HUMAN LEVEL NATURAL LANGUAGE UNDERSTANDING AND GENERATION?

 

The paper "Are Cortical Models Really Bound by the 'Binding Problem'?",
suggests in the first full paragraph on its second page that gen/comp
hierarchies avoids the "binding problem" by 

 

"coding an object through a set of intermediate features made up of local
arrangements of simpler features [that] sufficiently constrain the
representation to uniquely code complex objects without retaining global
positional information."

 

For example, in the context of speech recognition,

 

"...rather than using individual letters to code words, letter pairs or
higher-order combinations of letters can be used-i.e., although the word
"tomaso" might be confused with the word "somato" if both were coded by the
sets of letters they are made up of, this ambiguity is resolved if both are
represented through letter pairs."

 

The issue then becomes, WHAT SUB-SETS OF THE TYPES OF PROBLEMS THE HUMAN
BRAIN HAS TO PERFORM CAN BE PERFORMED IN A MANNER THAT AVOIDS THE BINDING
PROBLEM JUST BY USING A GEN/COMP HIERARCHY WITH SUCH "A SET OF SIMPLER
FEATURES [THAT] SUFFICIENTLY CONSTRAIN THE REPRESENTATION TO UNIQUELY CODE"
THE TYPE OF PATTERNS SUCH TASKS REQUIRE? 

 

There is substantial evidence that the brain does require synchrony for some
of its tasks --- as has been indicated by the work of people like Wolf
Singer --- suggesting that binding may well be a problem that cannot be
handled alone by the specificity of the brain's gen/comp hierarchies for all
mental tasks.

 

The table at the top of page 75 of Serre's impressive PhD thesis suggests
that his system --- which performs very quick feedforwad object recognition
roughly as well as a human --- has an input of 160 x 160 pixels, and
requires 23 million pattern models.  Such a large number of patterns helps
provide the "simpler features [that] sufficiently constrains the
representation to uniquely code complex objects without retaining global
positional information."

 

But, it should be noted --- as is recognized in Serre's paper --- that the
very rapid 150 msec feed forward recognition described in that paper is far
from all of human vision.  Such rapid recognition --- although surprisingly
accurate given how fast it is --- is normally supplemented by more top down
vision processes to confirm its best guesses.  For example, if a human is
shown a photograph of a face, his eyes will normally saccade over it, with
multiple fixation points, often on key features such as eyes, nose, corners
of mouth, points on the outline of the face, all indicating the recognition
of the face is normally much more than one rapid feed forward process.  It
is possible that synchronies, attention focusing, or other binding processes
are involved in these further steps of visual recognition.

 

One of my questions is: if such a relatively small (i.e., 160 x 160 pixel),
low-dimensional (i.e., 2 dimensional) input space as that in Serre's system
requires 23 million models so that it can sufficiently constrain the
representation to uniquely recognize high-level visual objects without then
need for any additional mechanism for binding, HOW MANY MODELS WOULD BE
REQUIRED TO PROPERLY CONSTRAIN RECOGNITION OF PATTERNS IN THE MUCH LARGER,
AND MUCH HIGHER-DIMENSIONAL SPACE IN WHICH SEMANTIC PATTERNS ---  SUCH AS
THOSE INVOLVED IN HUMAN LEVEL LANGUAGE UNDERSTANDING --- ARE REPRESENTED?

 

It is my hunch that in such a large. high-dimensional space, it is not
possible for the brain to have enough models necessary to provide the type
of constraint required --- without using additional mechanisms, such as
synchrony, or its equivalent --- to deal with the binding problem.  IS THIS
CORRECT?

 

But it is also my hunch that --- in the more complex representational spaces
required for more abstract levels of human thought --- gen/comp model
networks of the general type described by Poggio and Serre greatly reduce
the amount of binding that has to be handled by additional methods such as
synchrony and/or sequential wide-spread activation of conscious and near
conscious concepts at gama wave frequencies.  This decrease in the amount of
binding that has to be dealt with by methods other than the relatively high
resolution of the gen/comp hierarchy, itself, would greatly increase the
amount of parallel processing that can be performed in the sub-consious.  IS
THIS CORRECT?

 

I would be grateful for any intelligent answers to any of the questions I
have posed in ALL CAPS or any feedback about any of my other comments in
this email.  

 

These questions appear to be very important for the design of artificial
general intelligence (AGI) because AGI methods for dealing with binding are
considerably more computationally expensive than the relatively simple feed
forward computations of the type used in systems like that shown in Serre's
PhD thesis, or in Hinton's RBMs, because their additional binding methods
tend to require spreading activation that stores more complicated state
information at each activated node, and they tend to require matching
between the stored activation states at those nodes.  I have not figured out
any way to do massively-parallel, complex, context-sensitive semantic
pattern recognition and generation without some form of additional
processing to handle binding, unless one were to use many more models than
there are neurons in the human brain. IS THIS CORRECT?

 

I would appreciate very much any guidance those who might be more
knowledgeable on this subject could give to me and to the AGI list.

 

Sincerely,

Ed Porter

 

References

 

1. Are Cortical Models Really Bound by the "Binding Problem"? , by
Maximilian Riesenhuber and Tomaso Poggio, at
http://cbcl.mit.edu/projects/cbcl/publications/ps/riesenhuber-neuron-1999pd
f  

 

2. Learning a Dictionary of Shape-Components in Visual Cortex: Comparison
with Neurons, Humans and Machines, PhD thesis by Thomas Serre, at
http://cbcl.mit.edu/projects/cbcl/publications/ps/MIT-CSAIL-TR-2006-028.pdf


 

3. Robust Object Recognition with Cortex-Like Mechanisms, Thomas Serre, Lior
Wolf, Stanley Bileschi, Maximilian Riesenhuber, and Tomaso Poggio, at
http://web.mit.edu/serre/www/publications/Serre_etal_PAMI07.pdf   

 

4. Learning multiple layers of representation, Geoffrey E. Hinton, at
http://www.csri.utoronto.ca/~hinton/absps/tics.pdf  

 

5. The Next Generation of Neural Networks, Google Tech Talk by Geoffrey E.
Hinton, 11/29/07, on YouTube, at http://www.youtube.com/watch?v=AyzOUbkUf3M 

 

 

 




-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225
Powered by Listbox: http://www.listbox.com

Reply via email to