Dear James, Thanks for your question.
Adaptive Resonance Theory, or ART, does not face a problem of overfitting. Please keep in mind that it can autonomously and incrementally learn large nonstationary databases on the fly. The bottom-up connections in ART are adaptive. They learn recognition categories. They use a combination of a bottom-up adaptive filter followed, in the classifying layer, by recurrent competition that chooses the best cell population with which to characterize the incoming bottom-up input patterns in real time. This combination of processes tends to sparsify the data. I proved a theorem in 1976 which showed that, in response to a series of input patterns that are not too dense in pattern space, the competitive learning combination of bottom-up adaptive filtering and competition can stably learn recognition categories. I also showed that the adaptive weights oscillate at most once before they converge to their final values. See Grossberg, S. (1976). Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121-134. https://sites.bu.edu/steveg/files/2016/06/Gro1976BiolCyb_I.pdf Also in 1976, I introduced ART to show how this neural network could stably learn to classify and remember arbitrarily long finite sequences of input patterns that may overlap and be dense in input space: Grossberg, S. (1976). Adaptive pattern classification and universal recoding, II: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 23, 187-202. https://sites.bu.edu/steveg/files/2016/06/Gro1976BiolCyb_II.pdf For this to be possible, I modeled how activated recognition categories read out learned top-down expectations that are matched against bottom-up input patterns. A good enough match leads to resonance between the feature and category levels via both bottom-up and top-down excitatory signaling. This resonance state persists long enough to drive learning in both bottom-up and top-down adaptive weights. Hence the name ADAPTIVE Resonance Theory. If a poor match occurs, it drives hypothesis testing and memory search for a better-matching category, or selection of an uncommitted cell population with which to categorize novel information. I also proved that this learning process is dynamically stabilized by the top-down attentive matching process. It does not experience catastrophic forgetting. Overfitting does not occur in the dense input space because the same competitive learning combination of bottom-up adaptive filtering and competition chooses different cell populations with which to classify input patterns. In addition, with Gail Carpenter and John Reynolds, in two different articles, we introduced the concept of vigilance control that determines how concrete or abstract the separated categories will become due to learning. E.g., Carpenter, G.A., and Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54-115. https://sites.bu.edu/steveg/files/2016/06/CarGro1987CVGIP.pdf As noted in the Abstract: “Attentional vigilance determines how fine the learned categories will be. If vigilance increases due to an environmental disconfirmation, then the system automatically searches for and learns finer recognition categories.” High vigilance leads to learning of concrete and specific categories, like a frontal view of your mother’s face. Low vigilance leads to learning of abstract and general categories, like the fact that everyone has a face. The 1987 ART model uses unsupervised learning that classifies input patterns together based upon their similarity. Incorporating supervision that can regulate learning by predictive success or failure required supervised ARTMAP networks. These ARTMAP models could be trained by an arbitrary combination of unsupervised and supervised learning. E.g., Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., and Rosen, D.B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713. https://sites.bu.edu/steveg/files/2016/06/CarGroMarRey1992IEEETransNN.pdf Supervision takes the form of a predictive match or mismatch with the environment. A good enough match again resonates and learns. A bad enough mismatch triggers hypothesis testing that can discover and learn a recognition category that predicts the desired outcome well enough to satisfy vigilance. In a supervised ARTMAP, hypothesis testing drives a memory search that conjointly maximizes category generality while minimizing predictive errors. This minimax learning rule is an emergent property of network dynamics. It is not built into the model. Carpenter and I call it match tracking because it causes vigilance to increase just above the criterion that will drive a memory search. Match tracking sacrifices the minimum amount of generalization that is needed to correct the predictive error. Later articles showed that vigilance control exists in our brains and modeled its anatomy, neurophysiology, biophysics, and biochemistry. E.g., Grossberg, S. and Versace, M. (2008). Spikes, synchrony, and attentive learning by laminar thalamocortical circuits. Brain Research, 1218, 278-312. https://sites.bu.edu/steveg/files/2016/06/GroVer2008BR.pdf See also Grossberg, S. (2017). Acetylcholine neuromodulation in normal and abnormal learning and memory: Vigilance control in waking, sleep, autism, amnesia, and Alzheimer’s disease. Frontiers in Neural Circuits, November 2, 2017, https://www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir.2017.00082/full I hope that the above comments clarify that ART does not experience problems that can occur in neural networks like Deep Learning and its variants. Please do not hesitate to make comments or ask questions about the above results. Best, Steve From: James Bowery <[email protected]> Date: Sunday, October 27, 2024 at 10:24 AM To: AGI <[email protected]> Cc: Dorian Aur <[email protected]>, André Fabio Kohn via Comp-neuro <[email protected]>, Stephen Grossberg <[email protected]> Subject: Re: [agi] Re: [Comp-neuro] Re: Some scientific history that I experienced relevant to the recent Nobel Prizes to Hopfield and Hinton You don't often get email from [email protected]. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> I became aware of ART just prior to the second IJCNN in San Diego. It struck me intuitively as the right way to go but couldn't at that time figure out how to implement it in the DataCube video convolution hardware that, as it turns out, could have advanced the field by at least 15 years given the eventual breakthrough in convolution hardware by GPUs, but, alas, not toward ART. However, it may be that the hardware approach by Mead and Faggin are being obviated by thermodynamic computing. That's something I might be willing to spend some time on. One question: How does ART succeed in avoiding overfitting in the sense of Kolmogorov Complexity? In other words, in memorizing a corpus, how does ART reduce the complexity of the description of the connection matrix? On Sat, Oct 26, 2024 at 4:55 PM Grossberg, Stephen via AGI <[email protected]<mailto:[email protected]>> wrote: Dear Dorian, Thanks very much for your kind words about my work over the years on biological and artificial neural networks. I agree that some algorithms, like ChatGPT, require huge amounts of energy to create. It is also the case that learning models like Deep Learning use a lot of time and energy, because they use slow learning to be trained, often requiring hundreds or even thousands of repetitions of objects before they can be recognized. Our biological models require much less energy to run. For example, Adaptive Resonance Theory, or ART, neural networks have also been defined as algorithms that can be learn a large database in a single fast learning trial. For some ART algorithms, like ART 1, Gail Carpenter and I have been able to prove complete mathematical theorems with which to characterize the learning and memory process, including a proof that ART does not experience catastrophic forgetting. Computer simulations for ART 1 are not needed, because the theorems fully characterize the learning and memory process. See: Carpenter, G.A., and Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54-115. https://sites.bu.edu/steveg/files/2016/06/CarGro1987CVGIP.pdf There is another approach to reducing energy demands, and that is by emulating the brain circuits that support biological intelligence in the LAMINAR circuits of the CEREBRAL CORTEX. My colleagues and I have developed the paradigm of LAMINAR COMPUTING by defining laminar neocortical models, including a laminar realization of ART. We have shown, through explanations and quantitative computer simulations of data from many psychological and neurobiological experiments, how variations of the SAME canonical laminar cortical circuit can carry out vision and auditory perception, attention, cognition, emotion, planning, and action. For example: Grossberg, S. and Versace, M. (2008). Spikes, synchrony, and attentive learning by laminar thalamocortical circuits. Brain Research, 1218, 278-312. https://sites.bu.edu/steveg/files/2016/06/GroVer2008BR.pdf Grossberg, S. and Pearson, L. (2008). Laminar cortical dynamics of cognitive and motor working memory, sequence learning and performance: Toward a unified theory of how the cerebral cortex works. Psychological Review, 115, 677-732. https://sites.bu.edu/steveg/files/2016/06/GroPea2008.pdf Cao, Y., and Grossberg, S. (2012). Stereopsis and 3D surface perception by spiking neurons in laminar cortical circuits: A method of converting neural rate models into spiking models. Neural Networks, 26, 75-98. https://sites.bu.edu/steveg/files/2016/06/CaoGroTR2011.pdf Search my web page sites.bu.edu/steveg<http://sites.bu.edu/steveg> using the word “laminar” to find all the articles in which we develop laminar cortical models of biological intelligence. Because ALL of these processes use variations of a single canonical cortical circuit, they can be efficiently embodied in VLSI chips. The chips that realize different processes, because they have the same basic architecture, can be connected into a more comprehensive neural architecture that can begin to achieve Autonomous Adaptive Intelligence, or AGI, whichever phase is more pleasing to the reader. Efforts by people like Carver Mead and Federico Faggin began many years ago to embody various of our models in low energy VLSI chips. However, the short duration of funding prevented this venture from being completed. What is needed is a sustained and highly-funded program that today can most likely be financed by Google, Meta, or other deep-pocketed social media. When the project is completed, it may make the group that does it first very rich. Best, Steve From: Dorian Aur <[email protected]<mailto:[email protected]>> Date: Thursday, October 24, 2024 at 1:06 PM To: Grossberg, Stephen <[email protected]<mailto:[email protected]>>, AGI <[email protected]<mailto:[email protected]>> Cc: André Fabio Kohn via Comp-neuro <[email protected]<mailto:[email protected]>>, Stephen Grossberg <[email protected]<mailto:[email protected]>> Subject: Re: [Comp-neuro] Re: Some scientific history that I experienced relevant to the recent Nobel Prizes to Hopfield and Hinton You don't often get email from [email protected]<mailto:[email protected]>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Error! Filename not specified. Dear Stephen, I deeply admire your remarkable work in developing neural network models and maintaining a steady focus over an extended period in academic research, publishing numerous important papers and books along the way. There is a fundamental misunderstanding about the Nobel Prize and the computations that take place in the brain, as well as the related physics. The Nobel committee missed the opportunity to recognize Tesla’s work, however this time, the recent advancements in AI, particularly with ChatGPT, have prompted the committee to consider an award, recognizing that AI's impact will be as transformative as the beginning of electrification. It is unlikely that the Nobel committee was well-versed in the complete history of neural networks and neural computation, which have developed gradually over several decades. This evolution has seen neural learning transform into machine learning, with contributions from many esteemed scientists along the way. This history includes both significant advancements and too many setbacks. However, what remains misunderstood is that the algorithms executed on digital computers are quite different from the physical computations carried out by the brain. The human brain consumes around 20 watts of power on average (efficient computation) , while a typical supercluster uses around 1.3 MW. The future technology will have to replicate the computational efficiency of the brain . While we can't change previous events or the decisions made by the Nobel committee, we have the power to shape and transform technology moving forward. It took nearly 60 years to reach this point—how long will it take to replicate the brain's computational efficiency at just 20 watts? The positive aspect is that another Nobel Prize will be awarded for efforts to replicate the computational power of the brain using principles of physics🙂 Dorian Aur On Mon, Oct 21, 2024 at 5:50 AM Grossberg, Stephen via Comp-neuro <[email protected]<mailto:[email protected]>> wrote: Dear Comp-neuro colleagues, Here are some short summaries of the history of neural network discoveries, as I experienced it, that are relevant to the recent Nobel Prizes to Hopfield and Hinton: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++THETHE THE NOBEL PRIZES IN PHYSICS TO HOPFIELD AND HINTON FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HOPFIELD Here I summarize my concerns about the Hopfield award. I published articles in 1967 – 1972 in the Proceedings of the National Academy of Sciences that introduced the Additive Model that Hopfield used in 1984. My articles proved global theorems about the limits and oscillations of my Generalized Additive Models. See sites.bu.edu/steveg<http://sites.bu.edu/steveg> for these articles. For example: Grossberg, S. (1971). Pavlovian pattern learning by nonlinear neural networks. Proceedings of the National Academy of Sciences, 68, 828-831. https://lnkd.in/emzwx4Tw This article illustrates that my mathematical results were part of a research program to develop biological neural networks that provide principled mechanistic explanations of psychological and neurobiological data. Later, Michael Cohen and I published a Liapunov function that included the Additive Model and generalizations thereof in 1982 and 1983 before Hopfield (1984) appeared. For example, Cohen, M.A. and Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 815-826. https://lnkd.in/eAFAdvbu I was told that Hopfield knew about my work before he published his 1984 article, without citation. Recall that I started my neural networks research in 1957 as a Freshman at Dartmouth College. That year, I introduced the biological neural network paradigm, as well as the short-term memory (STM), medium-term memory (MTM), and long-term memory (LTM) laws that are used to this day, including in the Additive Model, to explain data about how brains make minds. See the review in https://lnkd.in/gJZJtP_W . When I started in 1957, I knew no one else who was doing neural networks. That is why my colleagues call me the Father of AI. I then worked hard to create a neural networks community, notably a research center, academic department, the International Neural Network Society, the journal Neural Networks, multiple international conferences on neural networks, and Boston-area research centers, while training over 100 gifted PhD students, postdocs, and faculty to do neural network research. See the Wikipedia page. That is why I did not have time or strength to fight for priority of my models. Recently, I was able to provide a self-contained and non-technical overview and synthesis of some of my scientific discoveries since 1957, as well as explanations of the work of many other scientists, in my 2021 Magnum Opus Conscious Mind, Resonant Brain: How Each Brain Makes a Mind https://lnkd.in/eiJh4Ti ++++++++++++++++++++++++++++++++++++++++++++++++++++ THE NOBEL PRIZES IN PHYSICS TO HOPFIELD AND HINTON FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HINTON Here I summarize my concerns about the Hinton award. Many authors developed Back Propagation (BP) before Hinton; e.g., Amari (1967), Werbos (1974), Parker (1982), all before Rumelhart, Hinton, & Williams (1986). BP has serious computational weaknesses: It is UNTRUSTWORTHY (because it is UNEXPLAINABLE). It is UNRELIABLE (because it can experience CATASTROPHIC FORGETTING. It should thus never be used in financial or medical applications. BP learning is also SLOW and uses non-biological NONLOCAL WEIGHT TRANSPORT. See Figure, right column, top. In 1988, I published 17 computational problems of BP: https://lnkd.in/erKJvXFA BP gradually grew out of favor because other models were better. Later, huge online databases and supercomputers enabled Deep Learning to use BP to learn. My 1988 article contrasted BP with Adaptive Resonance Theory (ART) which I first published in 1976: https://lnkd.in/evkfq22G See Figure, right column, bottom. ART never had BP’s problems. ART is now the most advanced cognitive and neural theory that explains HOW HUMANS LEARN TO ATTEND, RECOGNIZE, and PREDICT events in a changing world. ART also explains and simulates data from hundreds of psychological and neurobiological experiments. In 1980, I derived ART from a THOUGHT EXPERIMENT about how ANY system can AUTONOMOUSLY learn to correct predictive errors in a changing world: https://lnkd.in/eGWE8kJg The thought experiment derives ART from a few facts of life that do not mention mind or brain. ART is thus a UNIVERSAL solution of the problem of autonomous error correction in a changing world. That is why ART models can be used in designs for AUTONOMOUS ADAPTIVE INTELLIGENCE in engineering, technology, and AI. ART also proposes a solution of the classical MIND-BODY PROBLEM: HOW, WHERE in our brains, and WHY from a deep computational perspective, we CONSCIOUSLY SEE, HEAR, FEEL, and KNOW about the world, and use our conscious states to PLAN and ACT to realize VALUED GOALS. For details, see Conscious Mind, Resonant Brain: How Each Brain Makes a Mind https://lnkd.in/eiJh4Ti +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ _______________________________________________ Comp-neuro mailing list -- [email protected]<mailto:[email protected]> Mailing list webpage (to subscribe or view archives): https://www.cnsorg.org/comp-neuro-mailing-list To contact admin/moderators, send an email to: [email protected]<mailto:[email protected]> To unsubscribe, send an email to [email protected]<mailto:[email protected]> Artificial General Intelligence List<https://agi.topicbox.com/latest> / AGI / see discussions<https://agi.topicbox.com/groups/agi> + participants<https://agi.topicbox.com/groups/agi/members> + delivery options<https://agi.topicbox.com/groups/agi/subscription> Permalink<https://agi.topicbox.com/groups/agi/Tbd69a4c5580eb654-Ma9b63a5a5070f57a137f67aa> ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tbd69a4c5580eb654-Mfaae6ee3c832373d0eb0e077 Delivery options: https://agi.topicbox.com/groups/agi/subscription
