Ed Porter wrote:
WHAT PORTION OF CORTICAL PROCESSES ARE BOUND BY "THE BINDING PROBLEM"?
Here is an important practical, conceptual problem I am having trouble
with.
In an article entitled “Are Cortical Models Really Bound by the ‘Binding
Problem’? ” Tomaso Poggio’s group at MIT takes the position that there
is no need for special mechanisms to deal with the famous “binding
problem” --- at least in certain contexts, such as 150 msec feed forward
visual object recognition. This article implies that a properly
designed hierarchy of patterns that has both compositional and
max-pooling layers (I call them “gen/comp hierarchies”) automatically
handles the problem of what sub-elements are connected with which
others, preventing the need for techniques like synchrony to handle this
problem.
Poggio’s group has achieved impressive results without the need for
special mechanisms to deal with binding in this type of visual
recognition, as is indicated by the two papers below by Serre (the later
of which summarizes much of what is in the first, which is an excellent,
detailed PhD thesis.)
The two works by Geoffrey Hinton cited below are descriptions of
Hinton’s hierarchical feed-forward neural net recognition system (which,
when run backwards, generates patterns similar to those it has been
trained on). These two works by Hinton show impressive results in
handwritten digit recognition without any explicit mechanism for
binding. In particular, watch the portion of the Hinton YouTube video
starting at 21:35 - 26:39 where Hinton shows his system alternating
between recognizing a pattern and then generating a similar pattern
stochastically from the higher level activations that have resulted from
the previous recognition. See how amazingly well his system seems to
capture the many varied forms in which the various parts and sub-shapes
of numerical handwritten digits are related.
So my question is this: HOW BROADLY DOES THE IMPLICATION THAT THE
BINDING PROBLEM CAN BE AUTOMATICALLY HANDLED BY A GEN/COMP HIERARCHY OR
A HINTON-LIKE HIERARCHY APPLY TO THE MANY TYPES OF PROBLEMS A BRAIN
LEVEL ARTIFICIAL GENERAL INTELLIGENCE WOULD BE EXPECTED TO HANDLE? In
particular HOW APPLICABLE IS IT TO SEMANTIC PATTERN RECOGNITION AND
GENERATION --- WITH ITS COMPLEX AND HIGHLY VARIED RELATIONS --- SUCH AS
IS COMMONLY INVOLVED IN HUMAN LEVEL NATURAL LANGUAGE UNDERSTANDING AND
GENERATION?
The answer lies in the confusion over what the "binding problem"
actually is. There are many studies out there that misunderstand the
problem is such a substantial way that their conclusions are
meaningless. I refer, for example, to the seminal paper by Shastri and
Ajjangadde, which I remember discussing with a colleague (Janet Vousden)
back in the early 90s. We both went into that paper in great depth, an
independently came to the conclusion that S & A had their causality so
completely screwed up that the paper said nothing at all: they claimed
to be able to explain binding by showing that synhcronized firing could
make it happen, but they completely failed to show how the RELEVANT
neurons would become synchronized.
Distressingly, the Shastri and Ajjangadde paper then went on to become,
as I say, seminal, and there has been a lot of research on something
that these people call "the binding problem", but which seems (from my
limited coverage of that area) to be about getting various things to
connect using synchronized signals, but without any explanation of how
the the things that are semantically required to connect, actual connect.
So, to be able to answer your question, you have to be able to
disentangle that entire mess and become clear what is the real binding
problem, what is the fake binding problem, and whether the new idea
makes any difference to one or other of these.
In my opinion, it sounds like Poggio is correct in making the claim that
he does, but that Janet Vousden and I already understood that general
point back in 1994, just by using general principles. And, most
probably, the solution Poggio refers to DOES apply as well to what you
are calling the semantic level.
The paper “Are Cortical Models Really Bound by the ‘Binding Problem’?”,
suggests in the first full paragraph on its second page that gen/comp
hierarchies avoids the “binding problem” by
“coding an object through a set of intermediate features made up of
local arrangements of simpler features [that] sufficiently constrain the
representation to uniquely code complex objects without retaining global
positional information."
This is exactly the position that I took a couple of decades ago. You
will recall that I am always talking about doing this with CONSTRAINTS,
and using those constraints at many different levels of the hierarchy.
For example, in the context of speech recognition,
"...rather than using individual letters to code words, letter pairs or
higher-order combinations of letters can be used—i.e., although the word
“tomaso” might be confused with the word “somato” if both were coded by
the sets of letters they are made up of, this ambiguity is resolved if
both are represented through letter pairs.”
A strangely trivial point for them to make: this was the basis for all
the "triplet" representations that were in widespread use for NN
simulations back in the late 1980s.
The issue then becomes, WHAT SUB-SETS OF THE TYPES OF PROBLEMS THE HUMAN
BRAIN HAS TO PERFORM CAN BE PERFORMED IN A MANNER THAT AVOIDS THE
BINDING PROBLEM JUST BY USING A GEN/COMP HIERARCHY WITH SUCH “A SET OF
SIMPLER FEATURES [THAT] SUFFICIENTLY CONSTRAIN THE REPRESENTATION TO
UNIQUELY CODE” THE TYPE OF PATTERNS SUCH TASKS REQUIRE?
All of them.
There is substantial evidence that the brain does require synchrony for
some of its tasks --- as has been indicated by the work of people like
Wolf Singer --- suggesting that binding may well be a problem that
cannot be handled alone by the specificity of the brain’s gen/comp
hierarchies for all mental tasks.
No. The brain uses synchrony, but what relationship this has to binding
is unclear. I suspect these are processes happening at completely
different levels of description, therefore the connection is nonexistent.
The table at the top of page 75 of Serre’s impressive PhD thesis
suggests that his system --- which performs very quick feedforwad object
recognition roughly as well as a human --- has an input of 160 x 160
pixels, and requires 23 million pattern models. Such a large number of
patterns helps provide the “simpler features [that] sufficiently
constrains the representation to uniquely code complex objects without
retaining global positional information.”
But, it should be noted --- as is recognized in Serre’s paper --- that
the very rapid 150 msec feed forward recognition described in that paper
is far from all of human vision. Such rapid recognition --- although
surprisingly accurate given how fast it is --- is normally supplemented
by more top down vision processes to confirm its best guesses. For
example, if a human is shown a photograph of a face, his eyes will
normally saccade over it, with multiple fixation points, often on key
features such as eyes, nose, corners of mouth, points on the outline of
the face, all indicating the recognition of the face is normally much
more than one rapid feed forward process. It is possible that
synchronies, attention focusing, or other binding processes are involved
in these further steps of visual recognition.
One of my questions is: if such a relatively small (i.e., 160 x 160
pixel), low-dimensional (i.e., 2 dimensional) input space as that in
Serre’s system requires 23 million models so that it can sufficiently
constrain the representation to uniquely recognize high-level visual
objects without then need for any additional mechanism for binding, HOW
MANY MODELS WOULD BE REQUIRED TO PROPERLY CONSTRAIN RECOGNITION OF
PATTERNS IN THE MUCH LARGER, AND MUCH HIGHER-DIMENSIONAL SPACE IN WHICH
SEMANTIC PATTERNS --- SUCH AS THOSE INVOLVED IN HUMAN LEVEL LANGUAGE
UNDERSTANDING --- ARE REPRESENTED?
You may recall in my previous response that I did ask how these models
scaled up. If his "models" are what I think they are, and if they scale
as the square of the linear array size, then this approach would be
useless at these higher levels. That was what I suspected before: the
approach might well be progress in the lower reaches of the visual
system, but other - completely different - mechanisms are probably at
work higher up.
It is my hunch that in such a large. high-dimensional space, it is not
possible for the brain to have enough models necessary to provide the
type of constraint required --- without using additional mechanisms,
such as synchrony, or its equivalent --- to deal with the binding
problem. IS THIS CORRECT?
Not correct. Or rather, you are correct to say that the model does not
apply, but I see no reason to deduce that the binding problem per se has
any relevance to the problem of dealing with higher level processes. As
far as I can see, this is just a non sequiteur.
Rather, we need to understand the basic nature of those higher processes.
But it is also my hunch that --- in the more complex representational
spaces required for more abstract levels of human thought --- gen/comp
model networks of the general type described by Poggio and Serre greatly
reduce the amount of binding that has to be handled by additional
methods such as synchrony and/or sequential wide-spread activation of
conscious and near conscious concepts at gama wave frequencies. This
decrease in the amount of binding that has to be dealt with by methods
other than the relatively high resolution of the gen/comp hierarchy,
itself, would greatly increase the amount of parallel processing that
can be performed in the sub-consious. IS THIS CORRECT?
Cannot answer this question: do not understand it.
I would be grateful for any intelligent answers to any of the questions
I have posed in ALL CAPS or any feedback about any of my other comments
in this email.
These questions appear to be very important for the design of artificial
general intelligence (AGI) because AGI methods for dealing with binding
are considerably more computationally expensive than the relatively
simple feed forward computations of the type used in systems like that
shown in Serre’s PhD thesis, or in Hinton's RBMs, because their
additional binding methods tend to require spreading activation that
stores more complicated state information at each activated node, and
they tend to require matching between the stored activation states at
those nodes. I have not figured out any way to do massively-parallel,
complex, context-sensitive semantic pattern recognition and generation
without some form of additional processing to handle binding, unless one
were to use many more models than there are neurons in the human brain.
IS THIS CORRECT?
I would appreciate very much any guidance those who might be more
knowledgeable on this subject could give to me and to the AGI list.
Stepping back from the details, I think that what is happening here is
that if you take an approach to AGI that emphasizes the "Standard Model"
of the sort found in Novamente (a broad class of systems that is hard to
define compactly, but which has to do with the fact that there are
passive symbols being manipulated by supposedly smart mechanisms), then
you tend to get sucked into the idea that binding the right things
together is a crucial problem. To put it crudely, getting the system to
work depends crucially on who decides to talk to whom in your system.
It is very difficult to describe why this happens. Basically, this
style of AGI commits itself to specific acrchitectures very early, and
then later on the researcher wonders why, with so many degrees of
freedom nailed down, the last few pieces of the puzzle do not fit. Then
the researcher needs a name for the main culprit that does not fit. In
this case, you are calling it the binding problem .... getting the right
things to hook up together.
Problem is, you see, that getting the right things to hook up together
is the WHOLE STORY.
Richard Loosemore
Sincerely,
Ed Porter
References
1. Are Cortical Models Really Bound by the “Binding Problem”? , by
Maximilian Riesenhuber and Tomaso Poggio, at
http://cbcl.mit.edu/projects/cbcl/publications/ps/riesenhuber-neuron-1999.pdf
2. Learning a Dictionary of Shape-Components in Visual Cortex:
Comparison with Neurons, Humans and Machines, PhD thesis by Thomas
Serre, at
http://cbcl.mit.edu/projects/cbcl/publications/ps/MIT-CSAIL-TR-2006-028.pdf
3. Robust Object Recognition with Cortex-Like Mechanisms, Thomas Serre,
Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, and Tomaso Poggio,
at http://web.mit.edu/serre/www/publications/Serre_etal_PAMI07.pdf
4. Learning multiple layers of representation, Geoffrey E. Hinton, at
http://www.csri.utoronto.ca/~hinton/absps/tics.pdf
5. The Next Generation of Neural Networks, Google Tech Talk by Geoffrey
E. Hinton, 11/29/07, on YouTube, at
http://www.youtube.com/watch?v=AyzOUbkUf3M
------------------------------------------------------------------------
*agi* | Archives <http://www.listbox.com/member/archive/303/=now>
<http://www.listbox.com/member/archive/rss/303/> | Modify
<http://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225
Powered by Listbox: http://www.listbox.com