Linas, By "formal incompleteness" I mean Goedel's proof that "every sufficiently powerful formal system is either inconsistent or incomplete".
I first paid attention to category theory as an alternative philosophical basis for mathematics in response to Goedel's incompleteness theorem. It is an alternative to axiomatic set theory, the axiomatic formulation of which is itself a response to Goedel's incompleteness theorem. Category theory is an alternative philosophical basis. Instead of axioms, which are random, you choose to base everything in morphisms... which are random (both "random" in the "invariant" sense of a change which is not constrained in the system.) Not necessarily a big deal. But I had difficulty in the past trying to persuade linguists that these mathematical results have relevance for linguistics. Now here you are making exactly that mapping. LV: "So I crack open a book on linguistics, and its got dots and arrows in it, and like oh, cool! That really really reminds me of a simplicial set...". Re. distributional semantics. I agree it is old. But let me add to your analysis. The problem was not lack of computing power. The problem was that Chomsky shot it down because it resulted in "inconsistent or incoherent ... analyses". This was big. It cracked linguistics apart. Linguistics is divided by it to this day: Frederick J. Newmeyer, Generative Linguistics a historical perspective, Routledge 1996: "Part of the discussion of phonology in ’LBLT’ is directed towards showing that the conditions that were supposed to define a phonemic representation (including complementary distribution, locally determined biuniqueness, linearity, etc.) were inconsistent or incoherent in some cases and led to (or at least allowed) absurd analyses in others." Sydney Lamb: 'For example, perhaps his most celebrated argument concerns the Russian obstruents. He correctly pointed out that the usual solution incorporates a loss of generality, but he misdiagnosed the problem. The problem was the criterion of linearity. He stubbornly holds on to this criterion, although it really is faulty, and comes up with a solution for the Russian obstruents that obscures the phonological structure. I showed (in accounts cited below) that by relaxing the linearity requirement we get an elegant solution while preserving "centrality of contrastive function of linguistic elements".' This is what I call "contradictions". So we've got this random, contradictory, quality coming out again. Actually distributional semantics is not linear. There is evidence. The point is, you're right, the neural net people are just not thinking about this at all. It needs to be brought to their attention. I've tried to do that, but I don't have the rigour. What we need is a concerted effort to bring these theoretical concerns to the forefront. I think if we do that your problems finding volunteers to work on your codebase may disappear in quick time. The time is very ripe. The likes of Geoff Hinton, Ng, just about everyone, are coming out saying there is a problem, we need a reboot, we are looking for new solutions. A really tidy theoretical presentation of non-linearity, might focus minds, and bring funding. Because this is not a problem, it is a solution. The solution is simple. We must not assume linearity, we must not assume consistency, we must not try to "learn". We must think of the system as constantly generating new, contradictory, forms. I've been arguing this for years. As you say, seen sideways it is simple. The problem is no-one is looking at it like that. I don't know how much further we need to argue beyond that. I don't know how much further I need to argue, with you! I have emphasized that the system needs to be seen as based on principles for putting pieces together, compositional meaning. You have your "jigsaw". OK in itself, but then a jigsaw only goes together in one way. Is that "one way" significant?? I want to emphasize that we can build all sorts of new things, lots of ways, which will still be meaningful because built from meaningful relations. I have emphasized substituting parts into each other as this principle for building new things. This can be seen as a network operation so likely possible in your formalism, but perhaps not emphasized. I don't know if there is something particularly simple about the way I have formulated the problem. How efficient is your "jigsaw" search? I was very impressed when I discovered oscillations could resolve "cliques" of exactly the kind my "substitutions" might be represented by. I don't know if you saw the paper I linked for Ben earlier. I could probably implement my relations in your codebase. But all I really want from your code base is a network and speed. Oh, and actually a mechanism to see if bits will synchronize, cliques, "sheaves"(?), will project out and reveal themselves, when set oscillating. At the moment I feel I'm more likely to get the speed using one of the simpler neurosimulator inferfaces available publicly on SpiNNaker. And it doesn't seem inconsequent that they happen to have an oscillation synchronization mechanism too. But perhaps the most efficient way to move forward is to hit the more imaginative corners of the deep learning community with some really good theoretical presentations of these issues of linearity and randomness or contradiction of abstraction. Your presentations might have the rigour to make them take notice. At least they might, if given prominence in separation from your historical codebase, and presented in relation to the relevant evidence, with a simple alternative. -Rob On Fri, Feb 22, 2019 at 8:56 AM Linas Vepstas <[email protected]> wrote: > > > On Thu, Feb 21, 2019 at 3:10 AM Rob Freeman <[email protected]> > wrote: > >> >> couldn't resist. I do know the frustration >> > > I say things that hurt peoples feelings. Usually as a side-effect. > > >> actually scored a couple of home runs with me. >> > > Thank you! > > >> He has identified linearities in vector models as a key weakness, >> resolved the problem as one of reassembling parts (jigsaw pieces?), and >> even dropped a mention of category theory, which speaks to formal >> incompleteness. (Though I'm not sure he's thought that through, because it >> actually becomes an argument for distributed representation and against >> symbolism, against any fixed symbolism, anyway.) >> > > How can I avoid writing a long email? "Formal incompleteness" and > "category theory" are two different things. When you say "formal > incompleteness", I think that what you mean is "the deep learning people > made a breakthrough, but they do not understand why it actually works". And > that "they are not even trying to understand" or maybe "they are looking in > the wrong place for that understanding". And maybe that's true; I dunno. > When something works, people rarely ask why. > > Category theory... has 120-year-old roots in "Universal algebra" and > started life as a means of "formalization". But it isn't that, any more; > its evolved into something completely different. Lots simpler in some ways, > but clashes with what you learned in school. That clash leads to a mental > block: like a fun-house mirror, its "obvious" once you get it, but it's > perversely "hard to get". > > Here's what it is, today. It is a theory of dots an arrows, and a set of > rules of how dots and arrows can be assembled. And a discussion of what you > get when you assemble them. Some structures have a tiny handful of dots and > arrows: zero, one, span, cospan, equalizer, pushout, pullback. Some have a > countable infinity of dots and arrows: limit and colimit. Some have > uncountable infinities of arrows going back and fourth: adjoint functors. > There's more. As it happens, almost everything - literally - more or less > everything in mathematics can be reduced to dots an arrows. Huh. Who would > have guessed? Perhaps this smells like "formalization", depending on your > background and experience. To me, (and I think to probably most working > category theorists) its more like "gee look at the pretty pictures with > dots and arrows" and "huh, what does that remind me of? Oh, right it > reminds me of xyz". Back in the heyday, xyz was "the simplicial set" and > stuff like that. It's less about "formalization", than it is "hey guys, > look at the neat thing that happens when I do this!" > > Off-topic: 100 years ago (and even today) people thought that they could > reduce all of mathematics to "just set theory", which sounds crazy cause > sets are these absurdly trite, trivial things. Here, we just swap out sets > and replace by trite, trivial dots and arrows. Whoopee! Despite the > triteness of sets, the set theorists seem to have a lot on their hands. And > very few of them actually work on "formalization". (whatever that might > mean). Same deal for category theorists. its trite in the same way that > 1+1=2 is trite, and then one day you wake up and go "hang on... prime > numbers! WTF!?!?!" > > The phrase "distributional semantics" keeps popping up in this > conversation. Lets pick it apart. A "distribution" is that thing from > probability: a bunch of balls distributed into bins. Frequency of > dice-rolls and gambling-card deals. "Semantics" is "meaning", the study of > "meaning". Put together, what it means is that you can collect statistics > on N-grams, for N=3 or N=5, and mash those statistics into K=100 or 200 or > 300 bins (balls into bins) and then you notice that: "oh hey, check it! > Each different distribution is kind of like the meaning of a word! Oh wow: > they're like .. additive -- linear, like 'King - Man + Woman = Queen' > holds as an approximate equation between the distributions for the words > King, Man, Woman, Queen. What's the other famous example? 'Berlin - > Germany + France = Paris'. So this is the idea of "distributional > semantics". > > I think if you resurrected a dead linguist from the 1950's, they'd say > something like "no shit, sherlock, what else did you figure out while I was > dead?" The linguists always knew about distributional semantics, they just > didn't have the compute power, the algorithms. The deep-learning guys have > compute power and algorithms, but never really thought about linguistics > before. Perhaps they think that linguistics is trite. You know, like 1+1=2 > is trite. > > For me, its just more like a playground, or a candy store. So I crack open > a book on linguistics, and its got dots and arrows in it, and like oh, > cool! That really really reminds me of a simplicial set, except that its a > simplicial set with extra prongs pointing in other directions, so its not > that, but a fun-house mirror of one. So then I read about neural nets, and > they talk about vectors, except that they aren't really vectors (there's no > rotational symmetry; its not Euclidean space, why the heck do the neural > net people say "vector" when *obviously* they are not vectors? Do they > seriously think that 'Berlin - Germany + France = Paris' implies that > distributions are vectors?) Anyway, distributions obey various axioms (and > so do vectors), and those axioms are dot-and-arrow-like, and so there's yet > more fun-n-games of "gee, lookit what happens when one does this with > that". > > Mostly, I think one can go pretty darned far with just "plain old > statistics", you just have to break out of the mind-set of N-grams and > K=300-dimensional "vectors". I don't even think its that hard to do. You > just play the usual game of turning it sideways. > > Email too long, again. Sheesh. > > --linas > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T581199cf280badd7-M3d6a8389da8d115066b3156e Delivery options: https://agi.topicbox.com/groups/agi/subscription
