Linas,

OK. I'll take that to be saying, "No, I was not influenced by Coecke et al."

For all that, I can't figure out if you are contrasting yourself with their
treatment or if you like their treatment.

I quite liked their work when I came across it. In fact I had been thinking
for some time that category theory has something the flavour of a gauge
theory. So this was by way of a confirmation for me too. They also wrote
some about the applicability of math from quantum field theory to the
problem.

I have no problem with the substance of it. I just don't think it is
necessary. At least for the perceptual problem. The network is a perfectly
good representation for itself.

Other than that. Fine. Jigsaw pieces. OK. You're looking for a better
jigsaw piece. We can see it like that.

I say you can't resolve above the network. Simple enough for you?

I'd like to answer your questions about the meaning of words:

'"fixed"? What is being "lost"?  What are you "learning"? What do you mean
by "training"? What do you mean by "representation"? What do you mean by
"contradiction"?'...

But if you haven't understood them, it will probably be easier to use your
words than argue about them endlessly.

Anyway, in substance, you just don't understand what I am proposing. Is
that right?

-Rob

On Wed, Feb 20, 2019 at 8:52 AM Linas Vepstas <[email protected]>
wrote:

> Hi Rob,
>
> On Tue, Feb 19, 2019 at 3:23 AM Rob Freeman <[email protected]>
> wrote:
>
>>
>> An aside. You mention sheaf theory as a way to get around the linearity
>> of vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh,
>> and Clark proposed for compositional distributional models in the '00s?
>>
>
> No. But... How can I explain it quickly?  Bob Coecke & I both have PhD's
> in theoretical physics. We both know category theory. The difference is
> that he has published papers and edited books on the topic; whereas I have
> only read books and papers.
>
> In a nutshell:  category theory arose when mathematicians observed
> recurring patterns (of how symbols are arranged on a page) in different,
> unrelated branches of math.  For this discussion, there are two pattens
> that are relevant. One is "tensoring" (or multiplying, or simply "writing
> one symbol next to another symbol, in such a way that they are understood
> to go together with each-other") The other is "contracting" (or applying,
> or inserting into, or plugging in, or reducing, for example, plugging in
> "x" into "f(x)" to get "y", which can be written with an arrow: "x" next to
> "f(x)" --> y )
>
> These two operations "go together" and "work with one-another" in a very
> large number of settings, ranging from linear algebra to Hilbert spaces
> (quantum mechanics) to lambda calculus to the theory of computation. And
> also natural language.
>
> The wikipedia article about currying gives a flavor of the broadness of
> the concept. https://en.wikipedia.org/wiki/Currying  It is well-worth
> reading because it is both a simple concept, almost "trivial", and at the
> same time "deep and insightful".  In that article, the "times" symbol is
> the "tensor or multiplication", and the arrow is the applying/plugging-in.
>
> Next, one thinks like so: "great, I've got two operations, 'tensor' and
> 'arrow'. What is the set of all possible legal ways in which these two can
> be combined into an expression?" That is, "what are the legal expressions?"
>
> So, whenever one asks this kind of question: "I have some symbols, what
> are the legal ways of arranging them on a page?" the answer is "you have a
> 'language' and that 'language' has a 'syntax' (i.e. rules for legal
> arrangements).  Well, it turns out that the 'language' of 'tensor' and
> 'arrow' is exactly (simply-typed) lambda calculus. Wow. Because, of course,
> everyone knows that lambda calculus has something to do with computation -
> something important, even.
>
> When you get done studying and pondering everything I wrote above, you
> eventually come to realize that the legal arrangements of 'tensor' and
> 'arrow' look like graphs with lines connecting things together. There are
> some rules: you can only connect a plug into a socket of the correct shape.
> You can only plug one plug into only one socket, never many-to-one. In
> general, plugging to the left is different than plugging to the right.
> When you force left and right to be symmetric, you get tensor algebras,
> Hilbert spaces, and quantum mechanics. When you don't force that symmetry,
> you get natural language.
>
> In pictures, from Bob Coecke:
> http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf
>
> Notice the jigsaw puzzle pieces. The "plugging in" of plugs into sockets
> is .. like assembling jigsaw puzzle pieces.  There are more pictures of
> plugs and sockets here:
>
> http://math.ucr.edu/home/baez/rosetta.pdf
>
> and here:
>
> https://www.link.cs.cmu.edu/link/ftp-site/link-grammar/LG-tech-report.pdf
>
> Hmm. Interesting. The Baez paper says, in brief: "computer programs,
> logical theorems, lambda calculus, and tensor algebra is like assembling
> jigsaw puzzle pieces". The Coecke paper says "so is natural language". The
> Sleator/Temperley paper says "yeah we knew that two decades before you ever
> figured it out".
>
> So, the above presents an extremely broad foundation for assembling and
> organizing structural knowledge. Roughly "its all jigsaw puzzle pieces".
> Lots and lots of tensor-hom adjunction, everywhere you look. Graphs that
> fit together; connectors that have types. Type-theoretic types.
>
> What about neural nets? Well, the prototype is the Bengio "N-gram",
> word2vec, etc. knockoffs. What are they doing? Well, shiver me timbers, its
> just more tensor-hom adjunction. Once again. Was that really a surprise? By
> now, it shouldn't be. But if you look at the shape of their tensors (their
> jigsaw-puzzle pieces) they are, stupid, idiotic, even: they are N-grams.
> (each distinct N-gram is a jigsaw-puzzle piece; you can only connect them
> when the words "fit together") All alike, all very uniform. A casual
> disregard for everything that linguists have ever learned: its not all
> N-grams. Natural language has structure.   The word2vec people wildly
> oversimplify (i.e. ignore) that structure. They do, however, sort the
> N-grams (jigsaw pieces) into buckets (vectors) and notice that, hey, it
> works pretty darned well for semantics! Golly!
>
> (It's not called "tensorflow" because they thought the word "tensor"
> sounded really cool, and they should name something with it.)
>
> I'm saying "Great! Now that we know what it is that we are doing, lets
> just put the structure back in. Replace the N-grams by something more
> clever: replace the N-grams by the actual jigsaw pieces. And go from there".
>
> What I wrote above is on the verge of oversimplification. Understanding it
> clearly may take you years, or more, if this is new territory to you. But
> once you see it, once you can clearly articulate it, then the path forward
> becomes clear.
>
> The only thing to say about sheaves was that I realized that the rules for
> assembling the puzzle pieces just happen to be identical to the axioms of
> sheaf theory. Which I just happened to randomly notice because I was
> randomly reading a book on algebraic topology, and happened to think "wow,
> this is exactly the same stuff, the same rules".
>
>>
>> Anyway, perhaps Ben is right, you may be doing the first two steps of my
>> suggested solution: 1) coding only a sequence net of observed sequences,
>> and 2) projecting out latent "invariants" by clustering according to shared
>> contexts.
>>
>
> The problem is always that its easy to get a general idea about something,
> and its hard to convert it into code, a machine that actually works as
> intended.
>
>>
>> But then if you are doing all this, why are you using BERT type training
>>
>
> Never heard of BERT before ...
>
>
>> "to guide the numerical weightings of symbolic language-patterns"? That
>> will still trap you in the limitations of learned representations.
>>
>
>  ? So learn more, learn better? What's the problem, here?
>
>
>> The whole point of a network is that, like a distributed representation,
>> it can handle multiplicity of interpretation. Once you fix it by "learning"
>> you have lost this.
>>
>
> I don't know what you mean. What is being "fixed"? What is being "lost"?
> What are you "learning"?
>
>
>> The solution I came is to forget all thought of training or "learning"
>> representations. Not least because you get contradictions.
>>
>
> What do you mean by "training"? What do you mean by "representation"? What
> do you mean by "contradiction"?
>
> I know all of these words informally, I don't understand what you are
> trying to say with them.
>
>>
>> And I believe the best way to do that will be to set the network
>> oscillating and varying inhibition, to get the resolution of groupings we
>> want dynamically.
>>
>
> I don't know what to make of this, either. Things that oscillate are
> called "dynamical systems" and they have a deep and broad theory as well.
> The study of which is loosely termed "physics".  The word "inhibition"
> comes from the neural-net world, as a certain kind of non-linear effect.
> More broadly, "inhibition" means the "negation or inversion or opposition"
> of something, and certainly,  tensor algebras have various concepts of
> negation and inversion in them. I've not really thought about that a lot,
> at least, not with regards to natural language.
>
> --linas
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M2712ccb28a26e7daa718da0d
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to