Re: [agi] openAI's AI advances and PR stunt...

Linas Vepstas Tue, 19 Feb 2019 19:22:56 -0800

On Tue, Feb 19, 2019 at 5:33 PM Rob Freeman <[email protected]>
wrote:


> Linas,
>
> OK. I'll take that to be saying, "No, I was not influenced by Coecke et
> al."
>

Note to self: do not write long emails. (I was hoping it would serve some
educational purpose)

I knew the basics of cat theory before I knew any linguistics. I skimmed
the Coecke papers, I did not see anything surprising/unusual that made me
want to study them closely. Perhaps there are some golden nuggets in those
papers? What might they be?

 So, no, I was not influenced by it.

For all that, I can't figure out if you are contrasting yourself with their
> treatment or if you like their treatment.
>

I don't know what thier treatment is. After a skim, It seemed like word2vec
with some minor twist. Maybe I missed something.

>
> I quite liked their work when I came across it. In fact I had been
> thinking for some time that category theory has something the flavour of a
> gauge theory.
>

Yellow flag. Caution. I wouldn't go around saying things like that, if I
were you. The problem is that I've got a PhD in theoretical particle
physics and these kinds of remarks don't hold water.

I have no problem with the substance of it. I just don't think it is
> necessary. At least for the perceptual problem. The network is a perfectly
> good representation for itself.
>

To paraphrase: "I know that the earth goes around the sun. I don't think
it's necessary to understand Kepler's law".  For most people, that's a
perfectly fine statement.  Just don't mention black holes in the same
breath.

> I say you can't resolve above the network. Simple enough for you?

Too simple. No clue what that sentence means.

> '"fixed"? What is being "lost"?  What are you "learning"? What do you
mean by "training"? What do you mean by "representation"? What do you mean
by "contradiction"?'...
>  But if you haven't understood them, it will probably be easier to use
your words than argue about them endlessly.

???

> Anyway, in substance, you just don't understand what I am proposing. Is
that right?

I don't recall seeing a proposal. Perhaps I hopped in at the wrong end of
an earlier conversation.

I'm sorry, this conversation went upside down really fast. I've hit dead
end.

--linas


On Wed, Feb 20, 2019 at 8:52 AM Linas Vepstas <[email protected]>
wrote:

> Hi Rob,
>
> On Tue, Feb 19, 2019 at 3:23 AM Rob Freeman <[email protected]>
> wrote:
>
>>
>> An aside. You mention sheaf theory as a way to get around the linearity
>> of vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh,
>> and Clark proposed for compositional distributional models in the '00s?
>>
>
> No. But... How can I explain it quickly?  Bob Coecke & I both have PhD's
> in theoretical physics. We both know category theory. The difference is
> that he has published papers and edited books on the topic; whereas I have
> only read books and papers.
>
> In a nutshell:  category theory arose when mathematicians observed
> recurring patterns (of how symbols are arranged on a page) in different,
> unrelated branches of math.  For this discussion, there are two pattens
> that are relevant. One is "tensoring" (or multiplying, or simply "writing
> one symbol next to another symbol, in such a way that they are understood
> to go together with each-other") The other is "contracting" (or applying,
> or inserting into, or plugging in, or reducing, for example, plugging in
> "x" into "f(x)" to get "y", which can be written with an arrow: "x" next to
> "f(x)" --> y )
>
> These two operations "go together" and "work with one-another" in a very
> large number of settings, ranging from linear algebra to Hilbert spaces
> (quantum mechanics) to lambda calculus to the theory of computation. And
> also natural language.
>
> The wikipedia article about currying gives a flavor of the broadness of
> the concept. https://en.wikipedia.org/wiki/Currying  It is well-worth
> reading because it is both a simple concept, almost "trivial", and at the
> same time "deep and insightful".  In that article, the "times" symbol is
> the "tensor or multiplication", and the arrow is the applying/plugging-in.
>
> Next, one thinks like so: "great, I've got two operations, 'tensor' and
> 'arrow'. What is the set of all possible legal ways in which these two can
> be combined into an expression?" That is, "what are the legal expressions?"
>
> So, whenever one asks this kind of question: "I have some symbols, what
> are the legal ways of arranging them on a page?" the answer is "you have a
> 'language' and that 'language' has a 'syntax' (i.e. rules for legal
> arrangements).  Well, it turns out that the 'language' of 'tensor' and
> 'arrow' is exactly (simply-typed) lambda calculus. Wow. Because, of course,
> everyone knows that lambda calculus has something to do with computation -
> something important, even.
>
> When you get done studying and pondering everything I wrote above, you
> eventually come to realize that the legal arrangements of 'tensor' and
> 'arrow' look like graphs with lines connecting things together. There are
> some rules: you can only connect a plug into a socket of the correct shape.
> You can only plug one plug into only one socket, never many-to-one. In
> general, plugging to the left is different than plugging to the right.
> When you force left and right to be symmetric, you get tensor algebras,
> Hilbert spaces, and quantum mechanics. When you don't force that symmetry,
> you get natural language.
>
> In pictures, from Bob Coecke:
> http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf
>
> Notice the jigsaw puzzle pieces. The "plugging in" of plugs into sockets
> is .. like assembling jigsaw puzzle pieces.  There are more pictures of
> plugs and sockets here:
>
> http://math.ucr.edu/home/baez/rosetta.pdf
>
> and here:
>
> https://www.link.cs.cmu.edu/link/ftp-site/link-grammar/LG-tech-report.pdf
>
> Hmm. Interesting. The Baez paper says, in brief: "computer programs,
> logical theorems, lambda calculus, and tensor algebra is like assembling
> jigsaw puzzle pieces". The Coecke paper says "so is natural language". The
> Sleator/Temperley paper says "yeah we knew that two decades before you ever
> figured it out".
>
> So, the above presents an extremely broad foundation for assembling and
> organizing structural knowledge. Roughly "its all jigsaw puzzle pieces".
> Lots and lots of tensor-hom adjunction, everywhere you look. Graphs that
> fit together; connectors that have types. Type-theoretic types.
>
> What about neural nets? Well, the prototype is the Bengio "N-gram",
> word2vec, etc. knockoffs. What are they doing? Well, shiver me timbers, its
> just more tensor-hom adjunction. Once again. Was that really a surprise? By
> now, it shouldn't be. But if you look at the shape of their tensors (their
> jigsaw-puzzle pieces) they are, stupid, idiotic, even: they are N-grams.
> (each distinct N-gram is a jigsaw-puzzle piece; you can only connect them
> when the words "fit together") All alike, all very uniform. A casual
> disregard for everything that linguists have ever learned: its not all
> N-grams. Natural language has structure.   The word2vec people wildly
> oversimplify (i.e. ignore) that structure. They do, however, sort the
> N-grams (jigsaw pieces) into buckets (vectors) and notice that, hey, it
> works pretty darned well for semantics! Golly!
>
> (It's not called "tensorflow" because they thought the word "tensor"
> sounded really cool, and they should name something with it.)
>
> I'm saying "Great! Now that we know what it is that we are doing, lets
> just put the structure back in. Replace the N-grams by something more
> clever: replace the N-grams by the actual jigsaw pieces. And go from there".
>
> What I wrote above is on the verge of oversimplification. Understanding it
> clearly may take you years, or more, if this is new territory to you. But
> once you see it, once you can clearly articulate it, then the path forward
> becomes clear.
>
> The only thing to say about sheaves was that I realized that the rules for
> assembling the puzzle pieces just happen to be identical to the axioms of
> sheaf theory. Which I just happened to randomly notice because I was
> randomly reading a book on algebraic topology, and happened to think "wow,
> this is exactly the same stuff, the same rules".
>
>>
>> Anyway, perhaps Ben is right, you may be doing the first two steps of my
>> suggested solution: 1) coding only a sequence net of observed sequences,
>> and 2) projecting out latent "invariants" by clustering according to shared
>> contexts.
>>
>
> The problem is always that its easy to get a general idea about something,
> and its hard to convert it into code, a machine that actually works as
> intended.
>
>>
>> But then if you are doing all this, why are you using BERT type training
>>
>
> Never heard of BERT before ...
>
>
>> "to guide the numerical weightings of symbolic language-patterns"? That
>> will still trap you in the limitations of learned representations.
>>
>
>  ? So learn more, learn better? What's the problem, here?
>
>
>> The whole point of a network is that, like a distributed representation,
>> it can handle multiplicity of interpretation. Once you fix it by "learning"
>> you have lost this.
>>
>
> I don't know what you mean. What is being "fixed"? What is being "lost"?
> What are you "learning"?
>
>
>> The solution I came is to forget all thought of training or "learning"
>> representations. Not least because you get contradictions.
>>
>
> What do you mean by "training"? What do you mean by "representation"? What
> do you mean by "contradiction"?
>
> I know all of these words informally, I don't understand what you are
> trying to say with them.
>
>>
>> And I believe the best way to do that will be to set the network
>> oscillating and varying inhibition, to get the resolution of groupings we
>> want dynamically.
>>
>
> I don't know what to make of this, either. Things that oscillate are
> called "dynamical systems" and they have a deep and broad theory as well.
> The study of which is loosely termed "physics".  The word "inhibition"
> comes from the neural-net world, as a certain kind of non-linear effect.
> More broadly, "inhibition" means the "negation or inversion or opposition"
> of something, and certainly,  tensor algebras have various concepts of
> negation and inversion in them. I've not really thought about that a lot,
> at least, not with regards to natural language.
>
> --linas
>
*Artificial General Intelligence List <https://agi.topicbox.com/latest>* /
> AGI / see discussions <https://agi.topicbox.com/groups/agi> + participants
> <https://agi.topicbox.com/groups/agi/members> + delivery options
> <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T581199cf280badd7-M2712ccb28a26e7daa718da0d>
>


-- 
cassette tapes - analog TV - film cameras - you

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mf633f680c2ccd0fea884612c
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] openAI's AI advances and PR stunt...

Reply via email to