Re: [agi] Re: [Comp-neuro] Re: Some scientific history that I experienced relevant to the recent Nobel Prizes to Hopfield and Hinton

James Bowery Thu, 31 Oct 2024 09:18:28 -0700

Steve,

Thanks for the response.


TLDR:  I'm interested in bringing together Richard Granger's work on
language acquisition with indexed grammars
<https://github.com/DartmouthGrangerLab/hnet>, ART implementations in CPU
<https://github.com/NiklasMelton/AdaptiveResonanceLib> and the CPU-based
optimizations of Numenta
<https://www.numenta.com/blog/2024/05/13/unlocking-the-power-of-cpus-for-llm-deployment/>
toward a proof-of-principle entry to The Hutter Prize for Lossless
Compression of Human Knowledge <http://prize.hutter1.net/>.

Obviously I've been absent from your work ever since I focused on
convolution hardware acceleration back in 1990 for mere multisource image
segmentation -- but it has always stuck in the back of my mind as evidenced
by this question I posed to the HTM forum regarding my speculation of
LAMINART's relevance to possible convergent evolution in the great ape LGN
<https://discourse.numenta.org/t/is-it-a-mere-coincidence-that-great-ape-lgns-have-the-same-number-of-layers-as-the-neocortex/7511/3>.
I've a particular interest in the evolution of olfaction's possible
contribution to the Cambrian Explosion -- but that's getting pretty deep
into the weeds of my peculiar interests and inadequate understanding of
LAMINART.

In terms of my own competence, hence investment of time, I'm interested in
tunneling through the trillion dollar barrier to "large" language
model training.   I scare-quote "large" because the most principled information
criterion <https://en.wikipedia.org/wiki/Model_selection#Criteria> for
model selection is Algorithmic Information as a measure of model quality:
The minimizing the size, in algorithmic bits, of the algorithm that outputs
the training corpus.  I like to call this the Algorithmic Information
Criterion for model selection even though "AIC" is an acronym occupied by
the less-principled Akaike Information Criterion.  AIC is intuitively
correct as is Occam's Razor but the industry has gotten on the "large"
bandwagon and can't get off. This has resulted in what Sara Hooker has
termed "The Hardware Lottery <https://arxiv.org/abs/2009.06489>" barrier to
scientific advancement.  One will note with some irony that what I call
"the most principled information criterion" isn't even present in the
laundry list of information criteria at Wikipedia.

As a result there is a gap in not only AI research, but natural science
itself <https://www.maths.ed.ac.uk/~v1ranick/papers/wigner.pdf>, that
ignores models that don't fit the current winner of the hardware lottery.
To call the Grand Canyon a "gap" is something of an understatement.

So despite my priority in GPU-like hardware I've been trying to get people
interested in The Hutter Prize for Lossless Compression of Human Knowledge
<http://prize.hutter1.net/> ever since its announcement in 2006: a prize
that restricts contestants to using only a single *CPU* core as a way of
permitting nascent approaches, such as ART, to language model training a
fighting chance to win.

Now to ART in the present context:

In order for ART to be the basis of a winner of the Hutter Prize, it needs
to be specialized for language acquisition in a CPU friendly manner.
Richard Granger's work on indexed grammars, being based in neuroscience,
seems to provide a language-theory target for ART, and Numenta's recent
breakthroughs in neurosicence optimizations for CPU seems to be the final
step.

I realize this is all a bit much to take in but I needed to let you know
where I'm coming from.

-- Jim

On Mon, Oct 28, 2024 at 10:16 AM Grossberg, Stephen <[email protected]> wrote:

> Dear James,
>
>
>
> Thanks for your question.
>
>
>
> Adaptive Resonance Theory, or ART, does not face a problem of overfitting.
> Please keep in mind that it can autonomously and incrementally learn large
> nonstationary databases on the fly.
>
>
>
> The bottom-up connections in ART are adaptive. They learn recognition
> categories. They use a combination of a bottom-up adaptive filter followed,
> in the classifying layer, by recurrent competition that chooses the best
> cell population with which to characterize the incoming bottom-up input
> patterns in real time.
>
>
>
> This combination of processes tends to *sparsify *the data.
>
>
>
> I proved a theorem in 1976 which showed that, in response to a series of
> input patterns that are not too dense in pattern space, the *competitive
> learning* combination of bottom-up adaptive filtering and competition can
> stably learn recognition categories. I also showed that the adaptive
> weights oscillate at most once before they converge to their final values.
>
>
>
> See
>
>
>
> Grossberg, S. (1976). Adaptive pattern classification and universal
> recoding, I: Parallel development and coding of neural feature detectors. 
> *Biological
> Cybernetics*, *23*, 121-134.
>
> https://sites.bu.edu/steveg/files/2016/06/Gro1976BiolCyb_I.pdf
>
>
>
> Also in 1976, I introduced ART to show how this neural network could
> stably learn to classify and remember arbitrarily long finite sequences of
> input patterns that may overlap and be dense in input space:
>
>
>
> Grossberg, S. (1976). Adaptive pattern classification and universal
> recoding, II: Feedback, expectation, olfaction, and illusions. *Biological
> Cybernetics*, *23*, 187-202.
>
> https://sites.bu.edu/steveg/files/2016/06/Gro1976BiolCyb_II.pdf
>
>
>
> For this to be possible, I modeled how activated recognition categories
> read out learned top-down expectations that are matched against bottom-up
> input patterns. A *good enough match leads to resonance between the
> feature and category levels via both bottom-up and top-down excitatory
> signaling*. This resonance state persists long enough to drive learning
> in both bottom-up and top-down adaptive weights. Hence the name ADAPTIVE
> Resonance Theory.
>
>
>
> If a poor match occurs, it drives hypothesis testing and memory search for
> a better-matching category, or selection of an uncommitted cell population
> with which to categorize novel information.
>
>
>
> I also proved that this learning process is dynamically stabilized by the
> top-down attentive matching process. It does not experience *catastrophic
> forgetting*.
>
>
>
> Overfitting does not occur in the dense input space because the same
> competitive learning combination of bottom-up adaptive filtering and
> competition chooses different cell populations with which to classify input
> patterns.
>
>
>
> In addition, with Gail Carpenter and John Reynolds, in two different
> articles, we introduced the concept of *vigilance control* that
> determines how concrete or abstract the separated categories will become
> due to learning. E.g.,
>
>
>
> Carpenter, G.A., and Grossberg, S. (1987). A massively parallel
> architecture for a self-organizing neural pattern recognition machine. 
> *Computer
> Vision, Graphics, and Image Processing*,* 37*, 54-115.
>
> https://sites.bu.edu/steveg/files/2016/06/CarGro1987CVGIP.pdf
>
>
>
> As noted in the Abstract:
>
>
>
> “Attentional vigilance determines how fine the learned categories will be.
> If vigilance increases due to an environmental disconfirmation, then the
> system automatically searches for and learns finer recognition categories.”
>
>
>
> *High vigilance* leads to learning of concrete and specific categories,
> like a frontal view of your mother’s face.
>
>
>
> *Low vigilance* leads to learning of abstract and general categories,
> like the fact that everyone has a face.
>
>
>
> The 1987 ART model uses unsupervised learning that classifies input
> patterns together based upon their similarity.
>
>
>
> Incorporating supervision that can regulate learning by predictive success
> or failure required supervised ARTMAP networks. These ARTMAP models could
> be trained by an arbitrary combination of unsupervised and supervised
> learning. E.g.,
>
>
>
> Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., and Rosen,
> D.B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental
> supervised learning of analog multidimensional maps. *IEEE Transactions
> on Neural Networks*, *3*, 698-713.
>
> https://sites.bu.edu/steveg/files/2016/06/CarGroMarRey1992IEEETransNN.pdf
>
>
>
> Supervision takes the form of a predictive match or mismatch with the
> environment.
>
>
>
> A good enough match again resonates and learns.
>
>
>
> A bad enough mismatch triggers hypothesis testing that can *discover and
> learn a recognition category that predicts the desired outcome well enough
> to satisfy vigilance*.
>
>
>
> In a supervised ARTMAP, hypothesis testing drives a memory search that 
> *conjointly
> maximizes category generality while minimizing predictive errors*.
>
>
>
> This *minimax learning rule* is an emergent property of network dynamics.
> It is not built into the model.
>
>
>
> Carpenter and I call it *match tracking *because it causes vigilance to
> increase just above the criterion that will drive a memory search.
>
>
>
> Match tracking sacrifices the minimum amount of generalization that is
> needed to correct the predictive error.
>
>
>
> Later articles showed that vigilance control exists in our brains and
> modeled its anatomy, neurophysiology, biophysics, and biochemistry. E.g.,
>
>
>
> Grossberg, S. and Versace, M. (2008). Spikes, synchrony, and attentive
> learning by laminar thalamocortical circuits. *Brain Research*, *1218*,
> 278-312.
>
> https://sites.bu.edu/steveg/files/2016/06/GroVer2008BR.pdf
>
>
>
> See also
>
>
>
> Grossberg, S. (2017). Acetylcholine neuromodulation in normal and abnormal
> learning and memory: Vigilance control in waking, sleep, autism, amnesia,
> and Alzheimer’s disease. *Frontiers in Neural Circuits*, November 2,
> 2017,
>
>
> https://www.frontiersin.org/journals/neural-circuits/articles/10.3389/fncir.2017.00082/full
>
>
>
> I hope that the above comments clarify that ART does not experience
> problems that can occur in neural networks like Deep Learning and its
> variants.
>
>
>
> Please do not hesitate to make comments or ask questions about the above
> results.
>
>
>
> Best,
>
>
>
> Steve
>
>
>
> *From: *James Bowery <[email protected]>
> *Date: *Sunday, October 27, 2024 at 10:24 AM
> *To: *AGI <[email protected]>
> *Cc: *Dorian Aur <[email protected]>, André Fabio Kohn via Comp-neuro <
> [email protected]>, Stephen Grossberg <[email protected]>
> *Subject: *Re: [agi] Re: [Comp-neuro] Re: Some scientific history that I
> experienced relevant to the recent Nobel Prizes to Hopfield and Hinton
>
> You don't often get email from [email protected]. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
>
> I became aware of ART just prior to the second IJCNN in San Diego.  It
> struck me intuitively as the right way to go but couldn't at that time
> figure out how to implement it in the DataCube video convolution hardware
> that, as it turns out, could have advanced the field by at least 15 years
> given the eventual breakthrough in convolution hardware by GPUs, but, alas,
> not toward ART.
>
> However, it may be that the hardware approach by Mead and Faggin are being
> obviated by thermodynamic computing.  That's something I might be willing
> to spend some time on.
>
> One question:
>
> How does ART succeed in avoiding overfitting in the sense of Kolmogorov
> Complexity?  In other words, in memorizing a corpus, how does ART reduce
> the complexity of the description of the connection matrix?
>
>
>
>
>
>
>
> On Sat, Oct 26, 2024 at 4:55 PM Grossberg, Stephen via AGI <
> [email protected]> wrote:
>
> Dear Dorian,
>
>
>
> Thanks very much for your kind words about my work over the years on
> biological and artificial neural networks.
>
>
>
> I agree that some algorithms, like ChatGPT, require huge amounts of energy
> to create.
>
>
>
> It is also the case that learning models like Deep Learning use a lot of
> time and energy, because they use slow learning to be trained, often
> requiring hundreds or even thousands of repetitions of objects before they
> can be recognized.
>
>
>
> Our biological models require much less energy to run.
>
>
>
> For example, Adaptive Resonance Theory, or ART, neural networks have also
> been defined as algorithms that can be learn a large database in a single
> fast learning trial.
>
>
>
> For some ART algorithms, like ART 1, Gail Carpenter and I have been able
> to prove complete mathematical theorems with which to characterize the
> learning and memory process, including a proof that ART does not experience
> catastrophic forgetting.
>
>
>
> Computer simulations for ART 1 are not needed, because the theorems fully
> characterize the learning and memory process. See:
>
>
>
> Carpenter, G.A., and Grossberg, S. (1987). A massively parallel
> architecture for a self-organizing neural pattern recognition machine. 
> *Computer
> Vision, Graphics, and Image Processing*,* 37*, 54-115.
>
> https://sites.bu.edu/steveg/files/2016/06/CarGro1987CVGIP.pdf
>
>
>
> There is another approach to reducing energy demands, and that is by
> emulating the brain circuits that support biological intelligence in the
> LAMINAR circuits of the CEREBRAL CORTEX.
>
>
>
> My colleagues and I have developed the paradigm of *LAMINAR COMPUTING* by
> defining *laminar neocortical models*, including a laminar realization of
> ART. We have shown, through explanations and quantitative computer
> simulations of data from many psychological and neurobiological
> experiments, how variations of the SAME canonical laminar cortical circuit
> can carry out vision and auditory perception, attention, cognition,
> emotion, planning, and action.
>
>
>
> For example:
>
>
>
> Grossberg, S. and Versace, M. (2008). Spikes, synchrony, and attentive
> learning by laminar thalamocortical circuits. *Brain Research*, *1218*,
> 278-312.
>
> https://sites.bu.edu/steveg/files/2016/06/GroVer2008BR.pdf
>
>
>
> Grossberg, S. and Pearson, L. (2008). Laminar cortical dynamics of
> cognitive and motor working memory, sequence learning and performance:
> Toward a unified theory of how the cerebral cortex works. *Psychological
> Review, **115*, 677-732.
>
> https://sites.bu.edu/steveg/files/2016/06/GroPea2008.pdf
>
>
>
> Cao, Y., and Grossberg, S. (2012). Stereopsis and 3D surface perception by
> spiking neurons in laminar cortical circuits: A method of converting neural
> rate models into spiking models. *Neural Networks*, *26*, 75-98.
>
> https://sites.bu.edu/steveg/files/2016/06/CaoGroTR2011.pdf
>
>
>
> Search my web page sites.bu.edu/steveg using the word “laminar” to find
> all the articles in which we develop laminar cortical models of biological
> intelligence.
>
>
>
> Because ALL of these processes use variations of a single canonical
> cortical circuit, they can be efficiently embodied in VLSI chips.
>
>
>
> The chips that realize different processes, because they have the same
> basic architecture, can be connected into a more comprehensive neural
> architecture that can begin to achieve Autonomous Adaptive Intelligence, or
> AGI, whichever phase is more pleasing to the reader.
>
>
>
> Efforts by people like Carver Mead and Federico Faggin began many years
> ago to embody various of our models in low energy VLSI chips. However, the
> short duration of funding prevented this venture from being completed.
>
>
>
> What is needed is a sustained and highly-funded program that today can
> most likely be financed by Google, Meta, or other deep-pocketed social
> media.
>
>
>
> When the project is completed, it may make the group that does it first
> very rich.
>
>
>
> Best,
>
>
>
> Steve
>
>
>
> *From: *Dorian Aur <[email protected]>
> *Date: *Thursday, October 24, 2024 at 1:06 PM
> *To: *Grossberg, Stephen <[email protected]>, AGI <[email protected]>
> *Cc: *André Fabio Kohn via Comp-neuro <[email protected]>,
> Stephen Grossberg <[email protected]>
> *Subject: *Re: [Comp-neuro] Re: Some scientific history that I
> experienced relevant to the recent Nobel Prizes to Hopfield and Hinton
>
> You don't often get email from [email protected]. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
>
> *Error! Filename not specified.*
>
>
>
> Dear Stephen,
>
>
>
> I deeply admire your remarkable work in developing neural network models
> and maintaining a steady focus over an extended period in academic
> research, publishing numerous important papers and books along the way.
>
>
>
>  There is a fundamental misunderstanding about the Nobel Prize and the
> computations that take place in the brain, as well as the related physics.
> The Nobel committee missed the opportunity to recognize Tesla’s work,
> however  this time, the recent advancements in AI, particularly with
> ChatGPT, have prompted the committee to consider an award, recognizing that
> AI's impact will be as transformative as the beginning of electrification.
>
>
>
>  It is unlikely that the Nobel committee was well-versed in the complete
> history of neural networks and neural computation, which have developed
> gradually over several decades. This evolution has seen neural learning
> transform into machine learning, with contributions from many esteemed
> scientists  along the way.  This history includes both significant
> advancements and too many setbacks.
>
>
>
> However, what remains misunderstood is that *the algorithms executed on
> digital computers are quite different from the physical computations
> carried out by the brain.* The human brain consumes around 20 watts of
> power on average (efficient computation) , while  a typical
> supercluster uses around 1.3 MW. The future technology will have  to 
> *replicate
> the computational efficiency of the brain* .
>
>
>
> While we can't change previous events or the decisions made by the Nobel
> committee, we have the power to shape and transform technology moving
> forward.  It took nearly 60 years to reach this point—how long will it take
> to replicate the brain's computational efficiency at just 20 watts?
>
>  The positive aspect is that another Nobel Prize will be awarded for
> efforts to replicate the computational power of the brain  using principles
> of physics🙂
>
>
>
> Dorian Aur
>
>
>
>
>
>
>
>
> On Mon, Oct 21, 2024 at 5:50 AM Grossberg, Stephen via Comp-neuro <
> [email protected]> wrote:
>
> Dear Comp-neuro colleagues,
>
>
>
> Here are some short summaries of the history of neural network
> discoveries, as I experienced it, that are relevant to the recent Nobel
> Prizes to Hopfield and Hinton:
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++THETHE THE NOBEL
> PRIZES IN PHYSICS TO HOPFIELD AND HINTON
> FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HOPFIELD
>
> Here I summarize my concerns about the Hopfield award.
>
> I published articles in 1967 – 1972 in the Proceedings of the National
> Academy of Sciences that introduced the Additive Model that Hopfield used
> in 1984. My articles proved global theorems about the limits and
> oscillations of my Generalized Additive Models. See sites.bu.edu/steveg for
> these articles.
>
> For example:
>
> Grossberg, S. (1971). Pavlovian pattern learning by nonlinear neural
> networks. Proceedings of the National Academy of Sciences, 68, 828-831.
> https://lnkd.in/emzwx4Tw
>
> This article illustrates that my mathematical results were part of a
> research program to develop biological neural networks that provide
> principled mechanistic explanations of psychological and neurobiological
> data.
>
> Later, Michael Cohen and I published a Liapunov function that included the
> Additive Model and generalizations thereof in 1982 and 1983 before Hopfield
> (1984) appeared.
>
> For example,
>
> Cohen, M.A. and Grossberg, S. (1983). Absolute stability of global pattern
> formation and parallel memory storage by competitive neural networks. IEEE
> Transactions on Systems, Man, and Cybernetics, SMC-13, 815-826.
> https://lnkd.in/eAFAdvbu
>
> I was told that Hopfield knew about my work before he published his 1984
> article, without citation.
>
> Recall that I started my neural networks research in 1957 as a Freshman at
> Dartmouth College.
>
> That year, I introduced the biological neural network paradigm, as well as
> the short-term memory (STM), medium-term memory (MTM), and long-term memory
> (LTM) laws that are used to this day, including in the Additive Model, to
> explain data about how brains make minds.
>
> See the review in https://lnkd.in/gJZJtP_W .
>
> When I started in 1957, I knew no one else who was doing neural networks.
> That is why my colleagues call me the Father of AI.
>
> I then worked hard to create a neural networks community, notably a
> research center, academic department, the International Neural Network
> Society, the journal Neural Networks, multiple international conferences on
> neural networks, and Boston-area research centers, while training over 100
> gifted PhD students, postdocs, and faculty to do neural network research.
> See the Wikipedia page.
>
> That is why I did not have time or strength to fight for priority of my
> models.
>
> Recently, I was able to provide a self-contained and non-technical
> overview and synthesis of some of my scientific discoveries since 1957, as
> well as explanations of the work of many other scientists, in my 2021
> Magnum Opus
>
> Conscious Mind, Resonant Brain: How Each Brain Makes a Mind
>
> https://lnkd.in/eiJh4Ti
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> THE NOBEL PRIZES IN PHYSICS TO HOPFIELD AND HINTON
> FOR MODELS THEY DID NOT DISCOVER: THE CASE OF HINTON
>
> Here I summarize my concerns about the Hinton award.
>
> Many authors developed Back Propagation (BP) before Hinton; e.g., Amari
> (1967), Werbos (1974), Parker (1982), all before Rumelhart, Hinton, &
> Williams (1986).
>
> BP has serious computational weaknesses:
>
> It is UNTRUSTWORTHY (because it is UNEXPLAINABLE).
>
> It is UNRELIABLE (because it can experience CATASTROPHIC FORGETTING.
>
> It should thus never be used in financial or medical applications.
>
> BP learning is also SLOW and uses non-biological NONLOCAL WEIGHT TRANSPORT.
>
> See Figure, right column, top.
>
> In 1988, I published 17 computational problems of BP:
> https://lnkd.in/erKJvXFA
>
> BP gradually grew out of favor because other models were better.
>
> Later, huge online databases and supercomputers enabled Deep Learning to
> use BP to learn.
>
> My 1988 article contrasted BP with Adaptive Resonance Theory (ART) which I
> first published in 1976:
> https://lnkd.in/evkfq22G
>
> See Figure, right column, bottom.
>
> ART never had BP’s problems.
>
> ART is now the most advanced cognitive and neural theory that explains HOW
> HUMANS LEARN TO ATTEND, RECOGNIZE, and PREDICT events in a changing world.
>
> ART also explains and simulates data from hundreds of psychological and
> neurobiological experiments.
>
> In 1980, I derived ART from a THOUGHT EXPERIMENT about how ANY system can
> AUTONOMOUSLY learn to correct predictive errors in a changing world:
> https://lnkd.in/eGWE8kJg
>
> The thought experiment derives ART from a few facts of life that do not
> mention mind or brain.
>
> ART is thus a UNIVERSAL solution of the problem of autonomous error
> correction in a changing world.
>
> That is why ART models can be used in designs for AUTONOMOUS ADAPTIVE
> INTELLIGENCE in engineering, technology, and AI.
>
> ART also proposes a solution of the classical MIND-BODY PROBLEM:
>
> HOW, WHERE in our brains, and WHY from a deep computational perspective,
> we CONSCIOUSLY SEE, HEAR, FEEL, and KNOW about the world, and use our
> conscious states to PLAN and ACT to realize VALUED GOALS.
>
> For details, see
>
> Conscious Mind, Resonant Brain: How Each Brain Makes a Mind
>
> https://lnkd.in/eiJh4Ti
>
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> _______________________________________________
> Comp-neuro mailing list -- [email protected]
> Mailing list webpage (to subscribe or view archives):
> https://www.cnsorg.org/comp-neuro-mailing-list
>
> To contact admin/moderators, send an email to:
> [email protected]
> To unsubscribe, send an email to [email protected]
>
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/Tbd69a4c5580eb654-Ma9b63a5a5070f57a137f67aa>
>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tbd69a4c5580eb654-M2d2b0f5d7c947d6dce0da9b6
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Re: [Comp-neuro] Re: Some scientific history that I experienced relevant to the recent Nobel Prizes to Hopfield and Hinton

Reply via email to