James,

I think you're saying:

1) Grammatical abstractions may not be real, but they can still be
useful abstractions to parameterize "learning".

2) Even if after that there are "rules of thumb" which actually govern
everything.

Well, you might say why not just learn the "rules of thumb".

But the best counter against the usefulness of the Chomsky hierarchy
for parameterizing machine learning, might be that Chomsky himself
dismissed the idea it might be learned. And his most damaging
argument? That learned categories contradict. "Objects" behave
differently in one context, from how they behave in another context.

I see it a bit like our friend the Road Runner. You can figure out a
physics for him. But sometimes that just goes haywire and contradicts
itself - bodies make holes in rocks, fly high in the sky, or stretch
wide.

All the juice is in these weird "rules of thumb".

Chomsky too failed to find consistent objects. He was supposed to push
past the highly successful learning of phoneme "objects", and find
"objects" for syntax. And he failed. And the most important reason
I've found, was that even for phonemes, learned category contradicted.

That hierarchy stuff, that wasn't supposed to appear in the data. That
could only be in our heads. Innate. Why? Well for one thing, because
the data contradicted. The "learning procedures" of the time generated
contradictory objects. This is a forgotten result. Machine learning is
still ignoring this old result from the '50s. (Fair to say the
DeepMind paper ignores it?) Chomsky insisted these contradictions
meant the "objects" must be innate. The idea cognitive objects might
be new all the time (and particularly the idea they might contradict!)
is completely orthogonal to his hierarchy (well, it might be
compatible with context sensitivity, if you accept that the real juice
is in the mechanism to implement the context sensitivity?)

If categories contradict, that is represented on the Chomsky hierarchy
how? I don't know. How would you represent contradictory categories on
the Chomsky hierarchy? A form of context sensitivity?

Actually, I think, probably, using entangled objects like quantum. Or
relation and variance based objects as in category theory.

I believe Coecke's team has been working on "learning" exactly this:

>From Conceptual Spaces to Quantum Concepts: Formalising and Learning
Structured Conceptual Models
Sean Tull, Razin A. Shaikh, Sara Sabrina Zemljiˇc and Stephen Clark
Quantinuum
https://browse.arxiv.org/pdf/2401.08585

I'm not sure. I think the symbolica.ai people may be working on
something similar: find some level of abstraction which applies even
across varying objects (contradictions?)

For myself, in contrast to Bob Coecke, and the category theory folks,
I think it's pointless, and maybe unduly limiting, to learn this
indeterminate object formalism from data, and then collapse it into
one or other contradictory observable form, each time you observe it.
(Or seek some way you can reason with it even in indeterminate object
formulation, as with the category theory folks?) I think you might as
well collapse observable objects directly from the data.

I believe this collapse "rule of thumb", is the whole game, one shot,
no real "learning" involved.

All the Chomsky hierarchy limitations identified in the DeepMind paper
would disappear too. They are all limitations of not identifying
objects. Context coding hacks like LSTM, or "attention", introduced in
lieu of actual objects, and grammars over those objects, stemming from
the fact grammars of contradictory objects are not "learnable."

On Sun, May 26, 2024 at 11:24 PM James Bowery <[email protected]> wrote:
>
> It's also worth reiterating a point I made before about the confusion between 
> abstract grammar as a prior (heuristic) for grammar induction and the 
> incorporation of so-induced grammars as priors, such as in "physics informed 
> machine learning".
>
> In the case of physics informed machine learning, the language of physics is 
> incorporated into the learning algorithm.  This helps the machine learning 
> algorithm learn things about the physical world without having to re-derive 
> the body of physics knowledge.
>
> Don't confuse the two levels here:
>
> 1) My suspicion that natural language learning may benefit from prioritizing 
> HOPDA as an abstract grammar to learn something about natural languages -- 
> such as their grammars.
>
> 2) My suspicion (supported by "X informed machine learning" exemplified by 
> the aforelinked work) that there may be prior knowledge about natural 
> language more specific than the level of abstract grammar -- such as specific 
> rules of thumb for, say, the English language that may greatly speed training 
> time on English corpora.
>
> On Sun, May 26, 2024 at 9:40 AM James Bowery <[email protected]> wrote:
>>
>> See the recent DeepMind paper "Neural Networks and the Chomsky Hierarchy" 
>> for the sense of "grammar" I'm using when talking about the HNet paper's 
>> connection to Granger's prior papers about "grammar", the most recent being 
>> "Toward the quantification of cognition".  Although the DeepMind paper 
>> doesn't refer to Granger's work on HOPDAs, it does at least illustrate a 
>> fact, long-recognized in the theory of computation:
>>
>> Grammar, Computation
>> Regular, Finite-state automaton
>> Context-free, Non-deterministic pushdown automaton
>> Context sensitive, Linear-bounded non-deterministic Turing machine
>> Recursively enumerable, Turing machine
>>
>> Moreover, the DeepMind paper's empirical results support the corresponding 
>> hierarchy of computational power.
>>
>> Having said that, it is critical to recognize that everything in a finite 
>> universe reduces to finite-state automata in hardware -- it is only in our 
>> descriptive languages that the hierarchy exists.  We don't describe all 
>> computer programs in terms of finite-state automata aka regular grammar 
>> languages.  We don't describe all computer programs even in terms of Turing 
>> complete automata aka recursively enumerable grammar languages.
>>
>> And I have stated before (which I first linked to the HNet paper) HOPDAs are 
>> interesting as a heuristic because they may point the way to a 
>> prioritization if not restriction on the program search space that evolution 
>> has found useful in creating world models during an individual organism's 
>> lifetime.
>>
>> The choice of language, hence the level of grammar, depends on its utility 
>> in terms of the Algorithmic Information Criterion for model selection.
>>
>> I suppose one could assert that none of that matters so long as there is any 
>> portion of the "instruction set" that requires the Turing complete fiction, 
>> but that's a rather ham-handed critique of my nuanced point.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T682a307a763c1ced-M9642e3fe489907f3f2f2b4af
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to