On Tue, Feb 19, 2019 at 5:33 PM Rob Freeman <[email protected]> wrote:
> Linas, > > OK. I'll take that to be saying, "No, I was not influenced by Coecke et > al." > Note to self: do not write long emails. (I was hoping it would serve some educational purpose) I knew the basics of cat theory before I knew any linguistics. I skimmed the Coecke papers, I did not see anything surprising/unusual that made me want to study them closely. Perhaps there are some golden nuggets in those papers? What might they be? So, no, I was not influenced by it. For all that, I can't figure out if you are contrasting yourself with their > treatment or if you like their treatment. > I don't know what thier treatment is. After a skim, It seemed like word2vec with some minor twist. Maybe I missed something. > > I quite liked their work when I came across it. In fact I had been > thinking for some time that category theory has something the flavour of a > gauge theory. > Yellow flag. Caution. I wouldn't go around saying things like that, if I were you. The problem is that I've got a PhD in theoretical particle physics and these kinds of remarks don't hold water. I have no problem with the substance of it. I just don't think it is > necessary. At least for the perceptual problem. The network is a perfectly > good representation for itself. > To paraphrase: "I know that the earth goes around the sun. I don't think it's necessary to understand Kepler's law". For most people, that's a perfectly fine statement. Just don't mention black holes in the same breath. > I say you can't resolve above the network. Simple enough for you? Too simple. No clue what that sentence means. > '"fixed"? What is being "lost"? What are you "learning"? What do you mean by "training"? What do you mean by "representation"? What do you mean by "contradiction"?'... > But if you haven't understood them, it will probably be easier to use your words than argue about them endlessly. ??? > Anyway, in substance, you just don't understand what I am proposing. Is that right? I don't recall seeing a proposal. Perhaps I hopped in at the wrong end of an earlier conversation. I'm sorry, this conversation went upside down really fast. I've hit dead end. --linas On Wed, Feb 20, 2019 at 8:52 AM Linas Vepstas <[email protected]> wrote: > Hi Rob, > > On Tue, Feb 19, 2019 at 3:23 AM Rob Freeman <[email protected]> > wrote: > >> >> An aside. You mention sheaf theory as a way to get around the linearity >> of vector spaces. Is this influenced in any way by what Coecke, Sadrzadeh, >> and Clark proposed for compositional distributional models in the '00s? >> > > No. But... How can I explain it quickly? Bob Coecke & I both have PhD's > in theoretical physics. We both know category theory. The difference is > that he has published papers and edited books on the topic; whereas I have > only read books and papers. > > In a nutshell: category theory arose when mathematicians observed > recurring patterns (of how symbols are arranged on a page) in different, > unrelated branches of math. For this discussion, there are two pattens > that are relevant. One is "tensoring" (or multiplying, or simply "writing > one symbol next to another symbol, in such a way that they are understood > to go together with each-other") The other is "contracting" (or applying, > or inserting into, or plugging in, or reducing, for example, plugging in > "x" into "f(x)" to get "y", which can be written with an arrow: "x" next to > "f(x)" --> y ) > > These two operations "go together" and "work with one-another" in a very > large number of settings, ranging from linear algebra to Hilbert spaces > (quantum mechanics) to lambda calculus to the theory of computation. And > also natural language. > > The wikipedia article about currying gives a flavor of the broadness of > the concept. https://en.wikipedia.org/wiki/Currying It is well-worth > reading because it is both a simple concept, almost "trivial", and at the > same time "deep and insightful". In that article, the "times" symbol is > the "tensor or multiplication", and the arrow is the applying/plugging-in. > > Next, one thinks like so: "great, I've got two operations, 'tensor' and > 'arrow'. What is the set of all possible legal ways in which these two can > be combined into an expression?" That is, "what are the legal expressions?" > > So, whenever one asks this kind of question: "I have some symbols, what > are the legal ways of arranging them on a page?" the answer is "you have a > 'language' and that 'language' has a 'syntax' (i.e. rules for legal > arrangements). Well, it turns out that the 'language' of 'tensor' and > 'arrow' is exactly (simply-typed) lambda calculus. Wow. Because, of course, > everyone knows that lambda calculus has something to do with computation - > something important, even. > > When you get done studying and pondering everything I wrote above, you > eventually come to realize that the legal arrangements of 'tensor' and > 'arrow' look like graphs with lines connecting things together. There are > some rules: you can only connect a plug into a socket of the correct shape. > You can only plug one plug into only one socket, never many-to-one. In > general, plugging to the left is different than plugging to the right. > When you force left and right to be symmetric, you get tensor algebras, > Hilbert spaces, and quantum mechanics. When you don't force that symmetry, > you get natural language. > > In pictures, from Bob Coecke: > http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf > > Notice the jigsaw puzzle pieces. The "plugging in" of plugs into sockets > is .. like assembling jigsaw puzzle pieces. There are more pictures of > plugs and sockets here: > > http://math.ucr.edu/home/baez/rosetta.pdf > > and here: > > https://www.link.cs.cmu.edu/link/ftp-site/link-grammar/LG-tech-report.pdf > > Hmm. Interesting. The Baez paper says, in brief: "computer programs, > logical theorems, lambda calculus, and tensor algebra is like assembling > jigsaw puzzle pieces". The Coecke paper says "so is natural language". The > Sleator/Temperley paper says "yeah we knew that two decades before you ever > figured it out". > > So, the above presents an extremely broad foundation for assembling and > organizing structural knowledge. Roughly "its all jigsaw puzzle pieces". > Lots and lots of tensor-hom adjunction, everywhere you look. Graphs that > fit together; connectors that have types. Type-theoretic types. > > What about neural nets? Well, the prototype is the Bengio "N-gram", > word2vec, etc. knockoffs. What are they doing? Well, shiver me timbers, its > just more tensor-hom adjunction. Once again. Was that really a surprise? By > now, it shouldn't be. But if you look at the shape of their tensors (their > jigsaw-puzzle pieces) they are, stupid, idiotic, even: they are N-grams. > (each distinct N-gram is a jigsaw-puzzle piece; you can only connect them > when the words "fit together") All alike, all very uniform. A casual > disregard for everything that linguists have ever learned: its not all > N-grams. Natural language has structure. The word2vec people wildly > oversimplify (i.e. ignore) that structure. They do, however, sort the > N-grams (jigsaw pieces) into buckets (vectors) and notice that, hey, it > works pretty darned well for semantics! Golly! > > (It's not called "tensorflow" because they thought the word "tensor" > sounded really cool, and they should name something with it.) > > I'm saying "Great! Now that we know what it is that we are doing, lets > just put the structure back in. Replace the N-grams by something more > clever: replace the N-grams by the actual jigsaw pieces. And go from there". > > What I wrote above is on the verge of oversimplification. Understanding it > clearly may take you years, or more, if this is new territory to you. But > once you see it, once you can clearly articulate it, then the path forward > becomes clear. > > The only thing to say about sheaves was that I realized that the rules for > assembling the puzzle pieces just happen to be identical to the axioms of > sheaf theory. Which I just happened to randomly notice because I was > randomly reading a book on algebraic topology, and happened to think "wow, > this is exactly the same stuff, the same rules". > >> >> Anyway, perhaps Ben is right, you may be doing the first two steps of my >> suggested solution: 1) coding only a sequence net of observed sequences, >> and 2) projecting out latent "invariants" by clustering according to shared >> contexts. >> > > The problem is always that its easy to get a general idea about something, > and its hard to convert it into code, a machine that actually works as > intended. > >> >> But then if you are doing all this, why are you using BERT type training >> > > Never heard of BERT before ... > > >> "to guide the numerical weightings of symbolic language-patterns"? That >> will still trap you in the limitations of learned representations. >> > > ? So learn more, learn better? What's the problem, here? > > >> The whole point of a network is that, like a distributed representation, >> it can handle multiplicity of interpretation. Once you fix it by "learning" >> you have lost this. >> > > I don't know what you mean. What is being "fixed"? What is being "lost"? > What are you "learning"? > > >> The solution I came is to forget all thought of training or "learning" >> representations. Not least because you get contradictions. >> > > What do you mean by "training"? What do you mean by "representation"? What > do you mean by "contradiction"? > > I know all of these words informally, I don't understand what you are > trying to say with them. > >> >> And I believe the best way to do that will be to set the network >> oscillating and varying inhibition, to get the resolution of groupings we >> want dynamically. >> > > I don't know what to make of this, either. Things that oscillate are > called "dynamical systems" and they have a deep and broad theory as well. > The study of which is loosely termed "physics". The word "inhibition" > comes from the neural-net world, as a certain kind of non-linear effect. > More broadly, "inhibition" means the "negation or inversion or opposition" > of something, and certainly, tensor algebras have various concepts of > negation and inversion in them. I've not really thought about that a lot, > at least, not with regards to natural language. > > --linas > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* / > AGI / see discussions <https://agi.topicbox.com/groups/agi> + participants > <https://agi.topicbox.com/groups/agi/members> + delivery options > <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T581199cf280badd7-M2712ccb28a26e7daa718da0d> > -- cassette tapes - analog TV - film cameras - you ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T581199cf280badd7-Mf633f680c2ccd0fea884612c Delivery options: https://agi.topicbox.com/groups/agi/subscription
