Ed Porter wrote:
WHAT PORTION OF CORTICAL PROCESSES ARE BOUND BY "THE BINDING PROBLEM"?

Here is an important practical, conceptual problem I am having trouble with.

In an article entitled “Are Cortical Models Really Bound by the ‘Binding Problem’? ” Tomaso Poggio’s group at MIT takes the position that there is no need for special mechanisms to deal with the famous “binding problem” --- at least in certain contexts, such as 150 msec feed forward visual object recognition. This article implies that a properly designed hierarchy of patterns that has both compositional and max-pooling layers (I call them “gen/comp hierarchies”) automatically handles the problem of what sub-elements are connected with which others, preventing the need for techniques like synchrony to handle this problem.

Poggio’s group has achieved impressive results without the need for special mechanisms to deal with binding in this type of visual recognition, as is indicated by the two papers below by Serre (the later of which summarizes much of what is in the first, which is an excellent, detailed PhD thesis.) The two works by Geoffrey Hinton cited below are descriptions of Hinton’s hierarchical feed-forward neural net recognition system (which, when run backwards, generates patterns similar to those it has been trained on). These two works by Hinton show impressive results in handwritten digit recognition without any explicit mechanism for binding. In particular, watch the portion of the Hinton YouTube video starting at 21:35 - 26:39 where Hinton shows his system alternating between recognizing a pattern and then generating a similar pattern stochastically from the higher level activations that have resulted from the previous recognition. See how amazingly well his system seems to capture the many varied forms in which the various parts and sub-shapes of numerical handwritten digits are related.

So my question is this: HOW BROADLY DOES THE IMPLICATION THAT THE BINDING PROBLEM CAN BE AUTOMATICALLY HANDLED BY A GEN/COMP HIERARCHY OR A HINTON-LIKE HIERARCHY APPLY TO THE MANY TYPES OF PROBLEMS A BRAIN LEVEL ARTIFICIAL GENERAL INTELLIGENCE WOULD BE EXPECTED TO HANDLE? In particular HOW APPLICABLE IS IT TO SEMANTIC PATTERN RECOGNITION AND GENERATION --- WITH ITS COMPLEX AND HIGHLY VARIED RELATIONS --- SUCH AS IS COMMONLY INVOLVED IN HUMAN LEVEL NATURAL LANGUAGE UNDERSTANDING AND GENERATION?

The answer lies in the confusion over what the "binding problem" actually is. There are many studies out there that misunderstand the problem is such a substantial way that their conclusions are meaningless. I refer, for example, to the seminal paper by Shastri and Ajjangadde, which I remember discussing with a colleague (Janet Vousden) back in the early 90s. We both went into that paper in great depth, an independently came to the conclusion that S & A had their causality so completely screwed up that the paper said nothing at all: they claimed to be able to explain binding by showing that synhcronized firing could make it happen, but they completely failed to show how the RELEVANT neurons would become synchronized.

Distressingly, the Shastri and Ajjangadde paper then went on to become, as I say, seminal, and there has been a lot of research on something that these people call "the binding problem", but which seems (from my limited coverage of that area) to be about getting various things to connect using synchronized signals, but without any explanation of how the the things that are semantically required to connect, actual connect.

So, to be able to answer your question, you have to be able to disentangle that entire mess and become clear what is the real binding problem, what is the fake binding problem, and whether the new idea makes any difference to one or other of these.

In my opinion, it sounds like Poggio is correct in making the claim that he does, but that Janet Vousden and I already understood that general point back in 1994, just by using general principles. And, most probably, the solution Poggio refers to DOES apply as well to what you are calling the semantic level.

The paper “Are Cortical Models Really Bound by the ‘Binding Problem’?”, suggests in the first full paragraph on its second page that gen/comp hierarchies avoids the “binding problem” by

“coding an object through a set of intermediate features made up of local arrangements of simpler features [that] sufficiently constrain the representation to uniquely code complex objects without retaining global positional information."

This is exactly the position that I took a couple of decades ago. You will recall that I am always talking about doing this with CONSTRAINTS, and using those constraints at many different levels of the hierarchy.

For example, in the context of speech recognition,

"...rather than using individual letters to code words, letter pairs or higher-order combinations of letters can be used—i.e., although the word “tomaso” might be confused with the word “somato” if both were coded by the sets of letters they are made up of, this ambiguity is resolved if both are represented through letter pairs.”

A strangely trivial point for them to make: this was the basis for all the "triplet" representations that were in widespread use for NN simulations back in the late 1980s.

The issue then becomes, WHAT SUB-SETS OF THE TYPES OF PROBLEMS THE HUMAN BRAIN HAS TO PERFORM CAN BE PERFORMED IN A MANNER THAT AVOIDS THE BINDING PROBLEM JUST BY USING A GEN/COMP HIERARCHY WITH SUCH “A SET OF SIMPLER FEATURES [THAT] SUFFICIENTLY CONSTRAIN THE REPRESENTATION TO UNIQUELY CODE” THE TYPE OF PATTERNS SUCH TASKS REQUIRE?

All of them.


There is substantial evidence that the brain does require synchrony for some of its tasks --- as has been indicated by the work of people like Wolf Singer --- suggesting that binding may well be a problem that cannot be handled alone by the specificity of the brain’s gen/comp hierarchies for all mental tasks.

No. The brain uses synchrony, but what relationship this has to binding is unclear. I suspect these are processes happening at completely different levels of description, therefore the connection is nonexistent.


The table at the top of page 75 of Serre’s impressive PhD thesis suggests that his system --- which performs very quick feedforwad object recognition roughly as well as a human --- has an input of 160 x 160 pixels, and requires 23 million pattern models. Such a large number of patterns helps provide the “simpler features [that] sufficiently constrains the representation to uniquely code complex objects without retaining global positional information.”

But, it should be noted --- as is recognized in Serre’s paper --- that the very rapid 150 msec feed forward recognition described in that paper is far from all of human vision. Such rapid recognition --- although surprisingly accurate given how fast it is --- is normally supplemented by more top down vision processes to confirm its best guesses. For example, if a human is shown a photograph of a face, his eyes will normally saccade over it, with multiple fixation points, often on key features such as eyes, nose, corners of mouth, points on the outline of the face, all indicating the recognition of the face is normally much more than one rapid feed forward process. It is possible that synchronies, attention focusing, or other binding processes are involved in these further steps of visual recognition.

One of my questions is: if such a relatively small (i.e., 160 x 160 pixel), low-dimensional (i.e., 2 dimensional) input space as that in Serre’s system requires 23 million models so that it can sufficiently constrain the representation to uniquely recognize high-level visual objects without then need for any additional mechanism for binding, HOW MANY MODELS WOULD BE REQUIRED TO PROPERLY CONSTRAIN RECOGNITION OF PATTERNS IN THE MUCH LARGER, AND MUCH HIGHER-DIMENSIONAL SPACE IN WHICH SEMANTIC PATTERNS --- SUCH AS THOSE INVOLVED IN HUMAN LEVEL LANGUAGE UNDERSTANDING --- ARE REPRESENTED?

You may recall in my previous response that I did ask how these models scaled up. If his "models" are what I think they are, and if they scale as the square of the linear array size, then this approach would be useless at these higher levels. That was what I suspected before: the approach might well be progress in the lower reaches of the visual system, but other - completely different - mechanisms are probably at work higher up.

It is my hunch that in such a large. high-dimensional space, it is not possible for the brain to have enough models necessary to provide the type of constraint required --- without using additional mechanisms, such as synchrony, or its equivalent --- to deal with the binding problem. IS THIS CORRECT?

Not correct. Or rather, you are correct to say that the model does not apply, but I see no reason to deduce that the binding problem per se has any relevance to the problem of dealing with higher level processes. As far as I can see, this is just a non sequiteur.

Rather, we need to understand the basic nature of those higher processes.


But it is also my hunch that --- in the more complex representational spaces required for more abstract levels of human thought --- gen/comp model networks of the general type described by Poggio and Serre greatly reduce the amount of binding that has to be handled by additional methods such as synchrony and/or sequential wide-spread activation of conscious and near conscious concepts at gama wave frequencies. This decrease in the amount of binding that has to be dealt with by methods other than the relatively high resolution of the gen/comp hierarchy, itself, would greatly increase the amount of parallel processing that can be performed in the sub-consious. IS THIS CORRECT?

Cannot answer this question:  do not understand it.


I would be grateful for any intelligent answers to any of the questions I have posed in ALL CAPS or any feedback about any of my other comments in this email. These questions appear to be very important for the design of artificial general intelligence (AGI) because AGI methods for dealing with binding are considerably more computationally expensive than the relatively simple feed forward computations of the type used in systems like that shown in Serre’s PhD thesis, or in Hinton's RBMs, because their additional binding methods tend to require spreading activation that stores more complicated state information at each activated node, and they tend to require matching between the stored activation states at those nodes. I have not figured out any way to do massively-parallel, complex, context-sensitive semantic pattern recognition and generation without some form of additional processing to handle binding, unless one were to use many more models than there are neurons in the human brain. IS THIS CORRECT?

I would appreciate very much any guidance those who might be more knowledgeable on this subject could give to me and to the AGI list.


Stepping back from the details, I think that what is happening here is that if you take an approach to AGI that emphasizes the "Standard Model" of the sort found in Novamente (a broad class of systems that is hard to define compactly, but which has to do with the fact that there are passive symbols being manipulated by supposedly smart mechanisms), then you tend to get sucked into the idea that binding the right things together is a crucial problem. To put it crudely, getting the system to work depends crucially on who decides to talk to whom in your system.

It is very difficult to describe why this happens. Basically, this style of AGI commits itself to specific acrchitectures very early, and then later on the researcher wonders why, with so many degrees of freedom nailed down, the last few pieces of the puzzle do not fit. Then the researcher needs a name for the main culprit that does not fit. In this case, you are calling it the binding problem .... getting the right things to hook up together.

Problem is, you see, that getting the right things to hook up together is the WHOLE STORY.



Richard Loosemore



Sincerely,

Ed Porter

References

1. Are Cortical Models Really Bound by the “Binding Problem”? , by Maximilian Riesenhuber and Tomaso Poggio, at http://cbcl.mit.edu/projects/cbcl/publications/ps/riesenhuber-neuron-1999.pdf 2. Learning a Dictionary of Shape-Components in Visual Cortex: Comparison with Neurons, Humans and Machines, PhD thesis by Thomas Serre, at http://cbcl.mit.edu/projects/cbcl/publications/ps/MIT-CSAIL-TR-2006-028.pdf 3. Robust Object Recognition with Cortex-Like Mechanisms, Thomas Serre, Lior Wolf, Stanley Bileschi, Maximilian Riesenhuber, and Tomaso Poggio, at http://web.mit.edu/serre/www/publications/Serre_etal_PAMI07.pdf 4. Learning multiple layers of representation, Geoffrey E. Hinton, at http://www.csri.utoronto.ca/~hinton/absps/tics.pdf 5. The Next Generation of Neural Networks, Google Tech Talk by Geoffrey E. Hinton, 11/29/07, on YouTube, at http://www.youtube.com/watch?v=AyzOUbkUf3M

------------------------------------------------------------------------
*agi* | Archives <http://www.listbox.com/member/archive/303/=now> <http://www.listbox.com/member/archive/rss/303/> | Modify <http://www.listbox.com/member/?&;> Your Subscription [Powered by Listbox] <http://www.listbox.com>




-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225
Powered by Listbox: http://www.listbox.com

Reply via email to