Dear James,

Thanks for your question.

Adaptive Resonance Theory, or ART, does not face a problem of overfitting. 
Please keep in mind that it can autonomously and incrementally learn large 
nonstationary databases on the fly.

The bottom-up connections in ART are adaptive. They learn recognition 
categories. They use a combination of a bottom-up adaptive filter followed, in 
the classifying layer, by recurrent competition that chooses the best cell 
population with which to characterize the incoming bottom-up input patterns in 
real time.

This combination of processes tends to sparsify the data.

I proved a theorem in 1976 which showed that, in response to a series of input 
patterns that are not too dense in pattern space, the competitive learning 
combination of bottom-up adaptive filtering and competition can stably learn 
recognition categories. I also showed that the adaptive weights oscillate at 
most once before they converge to their final values.

See

Grossberg, S. (1976). Adaptive pattern classification and universal recoding, 
I: Parallel development and coding of neural feature detectors. Biological 
Cybernetics, 23, 121-134.
https://sites.bu.edu/steveg/files/2016/06/Gro1976BiolCyb_I.pdf

Also in 1976, I introduced ART to show how this neural network could stably 
learn to classify and remember arbitrarily long finite sequences of input 
patterns that may overlap and be dense in input space:

Grossberg, S. (1976). Adaptive pattern classification and universal recoding, 
II: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 
23, 187-202.
https://sites.bu.edu/steveg/files/2016/06/Gro1976BiolCyb_II.pdf

For this to be possible, I modeled how activated recognition categories read 
out learned top-down expectations that are matched against bottom-up input 
patterns. A good enough match leads to resonance between the feature and 
category levels via both bottom-up and top-down excitatory signaling. This 
resonance state persists long enough to drive learning in both bottom-up and 
top-down adaptive weights. Hence the name ADAPTIVE Resonance Theory.

If a poor match occurs, it drives hypothesis testing and memory search for a 
better-matching category, or selection of an uncommitted cell population with 
which to categorize novel information.

I also proved that this learning process is dynamically stabilized by the 
top-down attentive matching process. It does not experience catastrophic 
forgetting.

Overfitting does not occur in the dense input space because the same 
competitive learning combination of bottom-up adaptive filtering and 
competition chooses different cell populations with which to classify input 
patterns.

In addition, with Gail Carpenter and John Reynolds, in two different articles, 
we introduced the concept of vigilance control that determines how concrete or 
abstract the separated categories will become due to learning. E.g.,

Carpenter, G.A., and Grossberg, S. (1987). A massively parallel architecture 
for a self-organizing neural pattern recognition machine. Computer Vision, 
Graphics, and Image Processing, 37, 54-115.
https://sites.bu.edu/steveg/files/2016/06/CarGro1987CVGIP.pdf

As noted in the Abstract:

“Attentional vigilance determines how fine the learned categories will be. If 
vigilance increases due to an environmental disconfirmation, then the system 
automatically searches for and learns finer recognition categories.”

High vigilance leads to learning of concrete and specific categories, like a 
frontal view of your mother’s face.

Low vigilance leads to learning of abstract and general categories, like the 
fact that everyone has a face.

The 1987 ART model uses unsupervised learning that classifies input patterns 
together based upon their similarity.

Incorporating supervision that can regulate learning by predictive success or 
failure required supervised ARTMAP networks. These ARTMAP models could be 
trained by an arbitrary combination of unsupervised and supervised learning. 
E.g.,

Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., and Rosen, D.B. 
(1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised 
learning of analog multidimensional maps. IEEE Transactions on Neural Networks, 
3, 698-713.
https://sites.bu.edu/steveg/files/2016/06/CarGroMarRey1992IEEETransNN.pdf

Supervision takes the form of a predictive match or mismatch with the 
environment.

A good enough match again resonates and learns.

A bad enough mismatch triggers hypothesis testing that can discover and learn a 
recognition category that predicts the desired outcome well enough to satisfy 
vigilance.

In a supervised ARTMAP, hypothesis testing drives a memory search that 
conjointly maximizes category generality while minimizing predictive errors.

This minimax learning rule is an emergent property of network dynamics. It is 
not built into the model.

Carpenter and I call it match tracking because it causes vigilance to increase 
just above the criterion that will drive a memory search.

Match tracking sacrifices the minimum amount of generalization that is needed 
to correct the predictive error.

Later articles showed that vigilance control exists in our brains and modeled 
its anatomy, neurophysiology, biophysics, and biochemistry. E.g.,

Grossberg, S. and Versace, M. (2008). Spikes, synchrony, and attentive learning 
by laminar thalamocortical circuits. Brain Research, 1218, 278-312.
https://sites.bu.edu/steveg/files/2016/06/GroVer2008BR.pdf

See also

Grossberg, S. (2017). Acetylcholine neuromodulation in normal and abnormal 
learning and memory: Vigilance control in waking, sleep, autism, amnesia, and 
Alzheimer’s disease. Frontiers in Neural Circuits, November 2, 2017,
https://www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir.2017.00082/full

I hope that the above comments clarify that ART does not experience problems 
that can occur in neural networks like Deep Learning and its variants.

Please do not hesitate to make comments or ask questions about the above 
results.

Best,

Steve

From: James Bowery <[email protected]>
Date: Sunday, October 27, 2024 at 10:24 AM
To: AGI <[email protected]>
Cc: Dorian Aur <[email protected]>, André Fabio Kohn via Comp-neuro 
<[email protected]>, Stephen Grossberg <[email protected]>
Subject: Re: [agi] Re: [Comp-neuro] Re: Some scientific history that I 
experienced relevant to the recent Nobel Prizes to Hopfield and Hinton
You don't often get email from [email protected]. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
I became aware of ART just prior to the second IJCNN in San Diego.  It struck 
me intuitively as the right way to go but couldn't at that time figure out how 
to implement it in the DataCube video convolution hardware that, as it turns 
out, could have advanced the field by at least 15 years given the eventual 
breakthrough in convolution hardware by GPUs, but, alas, not toward ART.

However, it may be that the hardware approach by Mead and Faggin are being 
obviated by thermodynamic computing.  That's something I might be willing to 
spend some time on.

One question:

How does ART succeed in avoiding overfitting in the sense of Kolmogorov 
Complexity?  In other words, in memorizing a corpus, how does ART reduce the 
complexity of the description of the connection matrix?



On Sat, Oct 26, 2024 at 4:55 PM Grossberg, Stephen via AGI 
<[email protected]<mailto:[email protected]>> wrote:
Dear Dorian,

Thanks very much for your kind words about my work over the years on biological 
and artificial neural networks.

I agree that some algorithms, like ChatGPT, require huge amounts of energy to 
create.

It is also the case that learning models like Deep Learning use a lot of time 
and energy, because they use slow learning to be trained, often requiring 
hundreds or even thousands of repetitions of objects before they can be 
recognized.

Our biological models require much less energy to run.

For example, Adaptive Resonance Theory, or ART, neural networks have also been 
defined as algorithms that can be learn a large database in a single fast 
learning trial.

For some ART algorithms, like ART 1, Gail Carpenter and I have been able to 
prove complete mathematical theorems with which to characterize the learning 
and memory process, including a proof that ART does not experience catastrophic 
forgetting.

Computer simulations for ART 1 are not needed, because the theorems fully 
characterize the learning and memory process. See:

Carpenter, G.A., and Grossberg, S. (1987). A massively parallel architecture 
for a self-organizing neural pattern recognition machine. Computer Vision, 
Graphics, and Image Processing, 37, 54-115.
https://sites.bu.edu/steveg/files/2016/06/CarGro1987CVGIP.pdf

There is another approach to reducing energy demands, and that is by emulating 
the brain circuits that support biological intelligence in the LAMINAR circuits 
of the CEREBRAL CORTEX.

My colleagues and I have developed the paradigm of LAMINAR COMPUTING by 
defining laminar neocortical models, including a laminar realization of ART. We 
have shown, through explanations and quantitative computer simulations of data 
from many psychological and neurobiological experiments, how variations of the 
SAME canonical laminar cortical circuit can carry out vision and auditory 
perception, attention, cognition, emotion, planning, and action.

For example:

Grossberg, S. and Versace, M. (2008). Spikes, synchrony, and attentive learning 
by laminar thalamocortical circuits. Brain Research, 1218, 278-312.
https://sites.bu.edu/steveg/files/2016/06/GroVer2008BR.pdf

Grossberg, S. and Pearson, L. (2008). Laminar cortical dynamics of cognitive 
and motor working memory, sequence learning and performance: Toward a unified 
theory of how the cerebral cortex works. Psychological Review, 115, 677-732.
https://sites.bu.edu/steveg/files/2016/06/GroPea2008.pdf

Cao, Y., and Grossberg, S. (2012). Stereopsis and 3D surface perception by 
spiking neurons in laminar cortical circuits: A method of converting neural 
rate models into spiking models. Neural Networks, 26, 75-98.
https://sites.bu.edu/steveg/files/2016/06/CaoGroTR2011.pdf

Search my web page sites.bu.edu/steveg<http://sites.bu.edu/steveg> using the 
word “laminar” to find all the articles in which we develop laminar cortical 
models of biological intelligence.

Because ALL of these processes use variations of a single canonical cortical 
circuit, they can be efficiently embodied in VLSI chips.

The chips that realize different processes, because they have the same basic 
architecture, can be connected into a more comprehensive neural architecture 
that can begin to achieve Autonomous Adaptive Intelligence, or AGI, whichever 
phase is more pleasing to the reader.

Efforts by people like Carver Mead and Federico Faggin began many years ago to 
embody various of our models in low energy VLSI chips. However, the short 
duration of funding prevented this venture from being completed.

What is needed is a sustained and highly-funded program that today can most 
likely be financed by Google, Meta, or other deep-pocketed social media.

When the project is completed, it may make the group that does it first very 
rich.

Best,

Steve

From: Dorian Aur <[email protected]<mailto:[email protected]>>
Date: Thursday, October 24, 2024 at 1:06 PM
To: Grossberg, Stephen <[email protected]<mailto:[email protected]>>, AGI 
<[email protected]<mailto:[email protected]>>
Cc: André Fabio Kohn via Comp-neuro 
<[email protected]<mailto:[email protected]>>, Stephen 
Grossberg <[email protected]<mailto:[email protected]>>
Subject: Re: [Comp-neuro] Re: Some scientific history that I experienced 
relevant to the recent Nobel Prizes to Hopfield and Hinton
You don't often get email from [email protected]<mailto:[email protected]>. 
Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Error! Filename not specified.

Dear Stephen,

I deeply admire your remarkable work in developing neural network models and 
maintaining a steady focus over an extended period in academic research, 
publishing numerous important papers and books along the way.

 There is a fundamental misunderstanding about the Nobel Prize and the 
computations that take place in the brain, as well as the related physics. The 
Nobel committee missed the opportunity to recognize Tesla’s work, however  this 
time, the recent advancements in AI, particularly with ChatGPT, have prompted 
the committee to consider an award, recognizing that AI's impact will be as 
transformative as the beginning of electrification.

 It is unlikely that the Nobel committee was well-versed in the complete 
history of neural networks and neural computation, which have developed 
gradually over several decades. This evolution has seen neural learning 
transform into machine learning, with contributions from many esteemed 
scientists  along the way.  This history includes both significant advancements 
and too many setbacks.

However, what remains misunderstood is that the algorithms executed on digital 
computers are quite different from the physical computations carried out by the 
brain. The human brain consumes around 20 watts of power on average (efficient 
computation) , while  a typical supercluster uses around 1.3 MW. The future 
technology will have  to replicate the computational efficiency of the brain .

While we can't change previous events or the decisions made by the Nobel 
committee, we have the power to shape and transform technology moving forward.  
It took nearly 60 years to reach this point—how long will it take to replicate 
the brain's computational efficiency at just 20 watts?
 The positive aspect is that another Nobel Prize will be awarded for efforts to 
replicate the computational power of the brain  using principles of physics🙂

Dorian Aur




On Mon, Oct 21, 2024 at 5:50 AM Grossberg, Stephen via Comp-neuro 
<[email protected]<mailto:[email protected]>> wrote:
Dear Comp-neuro colleagues,

Here are some short summaries of the history of neural network discoveries, as 
I experienced it, that are relevant to the recent Nobel Prizes to Hopfield and 
Hinton:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++THETHE THE NOBEL 
PRIZES IN PHYSICS TO HOPFIELD AND HINTON
FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HOPFIELD

Here I summarize my concerns about the Hopfield award.

I published articles in 1967 – 1972 in the Proceedings of the National Academy 
of Sciences that introduced the Additive Model that Hopfield used in 1984. My 
articles proved global theorems about the limits and oscillations of my 
Generalized Additive Models. See 
sites.bu.edu/steveg<http://sites.bu.edu/steveg> for these articles.

For example:

Grossberg, S. (1971). Pavlovian pattern learning by nonlinear neural networks. 
Proceedings of the National Academy of Sciences, 68, 828-831.
https://lnkd.in/emzwx4Tw

This article illustrates that my mathematical results were part of a research 
program to develop biological neural networks that provide principled 
mechanistic explanations of psychological and neurobiological data.

Later, Michael Cohen and I published a Liapunov function that included the 
Additive Model and generalizations thereof in 1982 and 1983 before Hopfield 
(1984) appeared.

For example,

Cohen, M.A. and Grossberg, S. (1983). Absolute stability of global pattern 
formation and parallel memory storage by competitive neural networks. IEEE 
Transactions on Systems, Man, and Cybernetics, SMC-13, 815-826.
https://lnkd.in/eAFAdvbu

I was told that Hopfield knew about my work before he published his 1984 
article, without citation.

Recall that I started my neural networks research in 1957 as a Freshman at 
Dartmouth College.

That year, I introduced the biological neural network paradigm, as well as the 
short-term memory (STM), medium-term memory (MTM), and long-term memory (LTM) 
laws that are used to this day, including in the Additive Model, to explain 
data about how brains make minds.

See the review in https://lnkd.in/gJZJtP_W .

When I started in 1957, I knew no one else who was doing neural networks. That 
is why my colleagues call me the Father of AI.

I then worked hard to create a neural networks community, notably a research 
center, academic department, the International Neural Network Society, the 
journal Neural Networks, multiple international conferences on neural networks, 
and Boston-area research centers, while training over 100 gifted PhD students, 
postdocs, and faculty to do neural network research. See the Wikipedia page.

That is why I did not have time or strength to fight for priority of my models.

Recently, I was able to provide a self-contained and non-technical overview and 
synthesis of some of my scientific discoveries since 1957, as well as 
explanations of the work of many other scientists, in my 2021 Magnum Opus

Conscious Mind, Resonant Brain: How Each Brain Makes a Mind

https://lnkd.in/eiJh4Ti
++++++++++++++++++++++++++++++++++++++++++++++++++++
THE NOBEL PRIZES IN PHYSICS TO HOPFIELD AND HINTON
FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HINTON

Here I summarize my concerns about the Hinton award.

Many authors developed Back Propagation (BP) before Hinton; e.g., Amari (1967), 
Werbos (1974), Parker (1982), all before Rumelhart, Hinton, & Williams (1986).

BP has serious computational weaknesses:

It is UNTRUSTWORTHY (because it is UNEXPLAINABLE).

It is UNRELIABLE (because it can experience CATASTROPHIC FORGETTING.

It should thus never be used in financial or medical applications.

BP learning is also SLOW and uses non-biological NONLOCAL WEIGHT TRANSPORT.

See Figure, right column, top.

In 1988, I published 17 computational problems of BP:
https://lnkd.in/erKJvXFA

BP gradually grew out of favor because other models were better.

Later, huge online databases and supercomputers enabled Deep Learning to use BP 
to learn.

My 1988 article contrasted BP with Adaptive Resonance Theory (ART) which I 
first published in 1976:
https://lnkd.in/evkfq22G

See Figure, right column, bottom.

ART never had BP’s problems.

ART is now the most advanced cognitive and neural theory that explains HOW 
HUMANS LEARN TO ATTEND, RECOGNIZE, and PREDICT events in a changing world.

ART also explains and simulates data from hundreds of psychological and 
neurobiological experiments.

In 1980, I derived ART from a THOUGHT EXPERIMENT about how ANY system can 
AUTONOMOUSLY learn to correct predictive errors in a changing world:
https://lnkd.in/eGWE8kJg

The thought experiment derives ART from a few facts of life that do not mention 
mind or brain.

ART is thus a UNIVERSAL solution of the problem of autonomous error correction 
in a changing world.

That is why ART models can be used in designs for AUTONOMOUS ADAPTIVE 
INTELLIGENCE in engineering, technology, and AI.

ART also proposes a solution of the classical MIND-BODY PROBLEM:

HOW, WHERE in our brains, and WHY from a deep computational perspective, we 
CONSCIOUSLY SEE, HEAR, FEEL, and KNOW about the world, and use our conscious 
states to PLAN and ACT to realize VALUED GOALS.

For details, see

Conscious Mind, Resonant Brain: How Each Brain Makes a Mind

https://lnkd.in/eiJh4Ti
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

_______________________________________________
Comp-neuro mailing list -- 
[email protected]<mailto:[email protected]>
Mailing list webpage (to subscribe or view archives): 
https://www.cnsorg.org/comp-neuro-mailing-list

To contact admin/moderators, send an email to: 
[email protected]<mailto:[email protected]>
To unsubscribe, send an email to 
[email protected]<mailto:[email protected]>
Artificial General Intelligence List<https://agi.topicbox.com/latest> / AGI / 
see discussions<https://agi.topicbox.com/groups/agi> + 
participants<https://agi.topicbox.com/groups/agi/members> + delivery 
options<https://agi.topicbox.com/groups/agi/subscription> 
Permalink<https://agi.topicbox.com/groups/agi/Tbd69a4c5580eb654-Ma9b63a5a5070f57a137f67aa>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tbd69a4c5580eb654-Mfaae6ee3c832373d0eb0e077
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to