Re: [agi] The next advance over transformer models

James Bowery Wed, 29 Jun 2022 12:15:00 -0700

To the extent that grammar entails meaning, it can be considered a way of
defining equivalence classes of sentence meanings.  In this sense, the
choice of which sentence is to convey the intended meaning from its
equivalence class is a "special rule" for that particular sentence.  Is
that what you're getting at?


On Sat, Jun 25, 2022 at 5:59 AM Rob Freeman <[email protected]>
wrote:

> I've been taking a closer look at transformers. The big advance over LSTM
> was that they relate prediction to long distance dependencies directly,
> rather than passing long distance dependencies down a long recurrence
> chain. That's the whole "attention" shtick. I knew that. Nice.
>
> But something I was less aware of was that having broken long distance
> dependencies from the recurrence mechanism seems to have liberated them to
> go wild with directly representing dependencies. And with multi layers it
> seems they are building hierarchies over what they are "attending" to. So
> they are basically building grammars.
>
> This paper makes that clear:
>
> Piotr Nawrot, Hierarchical Transformers are More Efficient Language Models.
> https://youtu.be/soqWNyrdjkw
>
> They show that middle layers of language transformers explicitly
> generalize to reduce dimensions. That's a grammar.
>
> The question is, whether these grammars are different for each sentence in
> their data. If they are different they might reduce the dimensions of
> representation each time, but not in any way which can be abstracted
> universally.
>
> If the grammars generated are different for each sentence, then the
> advantage of transformers over attempts to learn grammar, like OpenCog's,
> will be that ignoring the hierarchies created and focusing solely on the
> prediction task, frees them from the expectation of universal primitives.
> They can generate a different hierarchy for each data sentence, and no-body
> notices. Ignorance is bliss.
>
> Set against that advantage, the disadvantage will be that ignoring the
> actual hierarchies created means we can't access those hierarchies for
> higher reasoning and constraint using world knowledge. Which is indeed the
> problem we face with transformers.
>
> And another disadvantage will be the equally known one that generating
> billions of subjective hierarchies in advance is enormously costly. And the
> less known one dependent on the subjective hierarchy insight, that
> generating hierarchies in advance is enormously wasteful of effort, and
> limiting. Because there will always be a limit to the number of subjective
> hierarchies you can generate in advance.
>
> If all this is true, the next stage to the advance of transformers will be
> to find a way to generate only relevant subjective hierarchies at run time.
>
> Transformers learn their hierarchies using back-prop to minimize
> predictive error over dot products. These dot products will converge on
> groupings of elements which share predictions. If there were a way to
> directly find these groupings of elements which share predictions, we might
> not have to rely on back-prop over dot products. And we might be able to
> find only relevant hierarchies at run time.
>
> So the key to improving over transformers would seem to be to leverage
> their (implicit) discovery that hierarchy is subjective to each sentence,
> and minimize the burden of generating that infinity of subjective
> hierarchies in advance, by finding a method to directly group elements
> which share predictions, without using back-prop over dot products. And
> applying that method to generate hierarchies which are subjective to each
> sentence presented to a system, only at the time each sentence is presented.
>
> If all the above is true, the key question should be: what method could
> directly group hierarchies of elements in language which share predictions?
>
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-Mcc9c079782e1c06676c055ea>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T5d6fde768988cb74-M67b793d71da984ded975bb8f
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] The next advance over transformer models

Reply via email to