Re: [agi] openAI's AI advances and PR stunt...

Linas Vepstas Tue, 19 Feb 2019 11:53:55 -0800

Hi Rob,

On Tue, Feb 19, 2019 at 3:23 AM Rob Freeman <[email protected]>
wrote:

>
> An aside. You mention sheaf theory as a way to get around the linearity of
> vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh, and
> Clark proposed for compositional distributional models in the '00s?
>

No. But... How can I explain it quickly?  Bob Coecke & I both have PhD's in
theoretical physics. We both know category theory. The difference is that
he has published papers and edited books on the topic; whereas I have only
read books and papers.

In a nutshell:  category theory arose when mathematicians observed
recurring patterns (of how symbols are arranged on a page) in different,
unrelated branches of math.  For this discussion, there are two pattens
that are relevant. One is "tensoring" (or multiplying, or simply "writing
one symbol next to another symbol, in such a way that they are understood
to go together with each-other") The other is "contracting" (or applying,
or inserting into, or plugging in, or reducing, for example, plugging in
"x" into "f(x)" to get "y", which can be written with an arrow: "x" next to
"f(x)" --> y )

These two operations "go together" and "work with one-another" in a very
large number of settings, ranging from linear algebra to Hilbert spaces
(quantum mechanics) to lambda calculus to the theory of computation. And
also natural language.

The wikipedia article about currying gives a flavor of the broadness of the
concept. https://en.wikipedia.org/wiki/Currying  It is well-worth reading
because it is both a simple concept, almost "trivial", and at the same time
"deep and insightful".  In that article, the "times" symbol is the "tensor
or multiplication", and the arrow is the applying/plugging-in.

Next, one thinks like so: "great, I've got two operations, 'tensor' and
'arrow'. What is the set of all possible legal ways in which these two can
be combined into an expression?" That is, "what are the legal expressions?"

So, whenever one asks this kind of question: "I have some symbols, what are
the legal ways of arranging them on a page?" the answer is "you have a
'language' and that 'language' has a 'syntax' (i.e. rules for legal
arrangements).  Well, it turns out that the 'language' of 'tensor' and
'arrow' is exactly (simply-typed) lambda calculus. Wow. Because, of course,
everyone knows that lambda calculus has something to do with computation -
something important, even.

When you get done studying and pondering everything I wrote above, you
eventually come to realize that the legal arrangements of 'tensor' and
'arrow' look like graphs with lines connecting things together. There are
some rules: you can only connect a plug into a socket of the correct shape.
You can only plug one plug into only one socket, never many-to-one. In
general, plugging to the left is different than plugging to the right.
When you force left and right to be symmetric, you get tensor algebras,
Hilbert spaces, and quantum mechanics. When you don't force that symmetry,
you get natural language.

In pictures, from Bob Coecke:
http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf

Notice the jigsaw puzzle pieces. The "plugging in" of plugs into sockets is
.. like assembling jigsaw puzzle pieces.  There are more pictures of plugs
and sockets here:

http://math.ucr.edu/home/baez/rosetta.pdf

and here:

https://www.link.cs.cmu.edu/link/ftp-site/link-grammar/LG-tech-report.pdf

Hmm. Interesting. The Baez paper says, in brief: "computer programs,
logical theorems, lambda calculus, and tensor algebra is like assembling
jigsaw puzzle pieces". The Coecke paper says "so is natural language". The
Sleator/Temperley paper says "yeah we knew that two decades before you ever
figured it out".

So, the above presents an extremely broad foundation for assembling and
organizing structural knowledge. Roughly "its all jigsaw puzzle pieces".
Lots and lots of tensor-hom adjunction, everywhere you look. Graphs that
fit together; connectors that have types. Type-theoretic types.

What about neural nets? Well, the prototype is the Bengio "N-gram",
word2vec, etc. knockoffs. What are they doing? Well, shiver me timbers, its
just more tensor-hom adjunction. Once again. Was that really a surprise? By
now, it shouldn't be. But if you look at the shape of their tensors (their
jigsaw-puzzle pieces) they are, stupid, idiotic, even: they are N-grams.
(each distinct N-gram is a jigsaw-puzzle piece; you can only connect them
when the words "fit together") All alike, all very uniform. A casual
disregard for everything that linguists have ever learned: its not all
N-grams. Natural language has structure.   The word2vec people wildly
oversimplify (i.e. ignore) that structure. They do, however, sort the
N-grams (jigsaw pieces) into buckets (vectors) and notice that, hey, it
works pretty darned well for semantics! Golly!

(It's not called "tensorflow" because they thought the word "tensor"
sounded really cool, and they should name something with it.)

I'm saying "Great! Now that we know what it is that we are doing, lets just
put the structure back in. Replace the N-grams by something more clever:
replace the N-grams by the actual jigsaw pieces. And go from there".

What I wrote above is on the verge of oversimplification. Understanding it
clearly may take you years, or more, if this is new territory to you. But
once you see it, once you can clearly articulate it, then the path forward
becomes clear.

The only thing to say about sheaves was that I realized that the rules for
assembling the puzzle pieces just happen to be identical to the axioms of
sheaf theory. Which I just happened to randomly notice because I was
randomly reading a book on algebraic topology, and happened to think "wow,
this is exactly the same stuff, the same rules".

>
> Anyway, perhaps Ben is right, you may be doing the first two steps of my
> suggested solution: 1) coding only a sequence net of observed sequences,
> and 2) projecting out latent "invariants" by clustering according to shared
> contexts.
>

The problem is always that its easy to get a general idea about something,
and its hard to convert it into code, a machine that actually works as
intended.

>
> But then if you are doing all this, why are you using BERT type training
>

Never heard of BERT before ...

> "to guide the numerical weightings of symbolic language-patterns"? That
> will still trap you in the limitations of learned representations.
>

 ? So learn more, learn better? What's the problem, here?

> The whole point of a network is that, like a distributed representation,
> it can handle multiplicity of interpretation. Once you fix it by "learning"
> you have lost this.
>

I don't know what you mean. What is being "fixed"? What is being "lost"?
What are you "learning"?

> The solution I came is to forget all thought of training or "learning"
> representations. Not least because you get contradictions.
>

What do you mean by "training"? What do you mean by "representation"? What
do you mean by "contradiction"?

I know all of these words informally, I don't understand what you are
trying to say with them.

>
> And I believe the best way to do that will be to set the network
> oscillating and varying inhibition, to get the resolution of groupings we
> want dynamically.
>

I don't know what to make of this, either. Things that oscillate are called
"dynamical systems" and they have a deep and broad theory as well. The
study of which is loosely termed "physics".  The word "inhibition" comes
from the neural-net world, as a certain kind of non-linear effect. More
broadly, "inhibition" means the "negation or inversion or opposition" of
something, and certainly,  tensor algebras have various concepts of
negation and inversion in them. I've not really thought about that a lot,
at least, not with regards to natural language.

--linas

>
> -Rob
>
> On Tue, Feb 19, 2019 at 6:45 PM Linas Vepstas <[email protected]>
> wrote:
>
>> Hi Rob,
>>
>> On Mon, Feb 18, 2019 at 4:40 PM Rob Freeman <[email protected]>
>> wrote:
>>
>>> Ben,
>>>
>>> That's what I thought. You're still working with Link Grammar.
>>>
>>> But since last year working on informing your links with stats from
>>> deep-NN type, learned, embedding vector based predictive models? You're
>>> trying to span the weakness of each formalism with the strengths of the
>>> other??
>>>
>>
>> Yes but no. I've been trying to explain what exactly is good, and what,
>> exactly is bad with NN vector-space models. There is a long tract written
>> on this here.
>> https://github.com/opencog/opencog/raw/master/opencog/nlp/learn/learn-lang-diary/skippy.pdf
>>
>>
>>
>>>
>>> There's a lot to say about all of that.
>>>
>>> Your grammar will be learned, with only the resolution you bake in from
>>> the beginning.
>>>
>> No.
>>
>>
>>> Your embedding vectors will be learned,
>>>
>>
>> The point of the long PDF is to explain why NN-vectors are bad. It
>> attempts to first explain *why* neural nets work for language, and why
>> vectors are *almost* the right thing, and then it tries to explain why NN
>> vectors don't actually do everything you actually want.  I've noticed that,
>> in the middle of all these explanations, I lose my audience; haven't
>> figured out how to keep them, yet.
>>
>>
>>> and the dependency decisions they can inform on learned, and thus
>>> finite, too. Plus you need to keep two formalisms and marry them
>>> together... Large teams for all of that...
>>>
>>
>> No. I've already got 75% of it coded up. It actually works, I've got long
>> diary entries and notes with detailed stats on it all.  Unfortunately, I
>> have not been able to carve out the time to finish the work, its been
>> stalled since the fall of last year.
>>
>> It would be wonderful if I could get someone else interested in this work.
>>
>> --linas
>>
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mf8d91ef7fb9013cf13f130c7>
>

-- 
cassette tapes - analog TV - film cameras - you

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M06471cecafce0e8632d63a1e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] openAI's AI advances and PR stunt...

Reply via email to