Re: [agi] openAI's AI advances and PR stunt...

Rob Freeman Thu, 21 Feb 2019 15:15:36 -0800

Linas,

By "formal incompleteness" I mean Goedel's proof that "every sufficiently
powerful formal system is either inconsistent or incomplete".

I first paid attention to category theory as an alternative philosophical
basis for mathematics in response to Goedel's incompleteness theorem. It is
an alternative to axiomatic set theory, the axiomatic formulation of which
is itself a response to Goedel's incompleteness theorem. Category theory is
an alternative philosophical basis. Instead of axioms, which are random,
you choose to base everything in morphisms... which are random (both
"random" in the "invariant" sense of a change which is not constrained in
the system.)

Not necessarily a big deal. But I had difficulty in the past trying to
persuade linguists that these mathematical results have relevance for
linguistics. Now here you are making exactly that mapping.

LV: "So I crack open a book on linguistics, and its got dots and arrows in
it, and like oh, cool! That really really reminds me of a simplicial
set...".

Re. distributional semantics. I agree it is old. But let me add to your
analysis. The problem was not lack of computing power. The problem was that
Chomsky shot it down because it resulted in "inconsistent or incoherent ...
analyses". This was big. It cracked linguistics apart. Linguistics is
divided by it to this day:

Frederick J. Newmeyer, Generative Linguistics a historical perspective,
Routledge 1996:

"Part of the discussion of phonology in ’LBLT’ is directed towards showing
that the conditions that were supposed to define a phonemic representation
(including complementary distribution, locally determined biuniqueness,
linearity, etc.) were inconsistent or incoherent in some cases and led to
(or at least allowed) absurd analyses in others."

Sydney Lamb:

'For example, perhaps his most celebrated argument concerns the Russian
obstruents. He correctly pointed out that the usual solution incorporates a
loss of generality, but he misdiagnosed the problem. The problem was the
criterion of linearity. He stubbornly holds on to this criterion, although
it really is faulty, and comes up with a solution for the Russian
obstruents that obscures the phonological structure. I showed (in accounts
cited below) that by relaxing the linearity requirement we get an elegant
solution while preserving "centrality of contrastive function of linguistic
elements".'

This is what I call "contradictions".

So we've got this random, contradictory, quality coming out again. Actually
distributional semantics is not linear. There is evidence.

The point is, you're right, the neural net people are just not thinking
about this at all. It needs to be brought to their attention.

I've tried to do that, but I don't have the rigour.

What we need is a concerted effort to bring these theoretical concerns to
the forefront. I think if we do that your problems finding volunteers to
work on your codebase may disappear in quick time. The time is very ripe.
The likes of Geoff Hinton, Ng, just about everyone, are coming out saying
there is a problem, we need a reboot, we are looking for new solutions. A
really tidy theoretical presentation of non-linearity, might focus minds,
and bring funding.

Because this is not a problem, it is a solution. The solution is simple. We
must not assume linearity, we must not assume consistency, we must not try
to "learn". We must think of the system as constantly generating new,
contradictory, forms. I've been arguing this for years.

As you say, seen sideways it is simple. The problem is no-one is looking at
it like that.

I don't know how much further we need to argue beyond that. I don't know
how much further I need to argue, with you! I have emphasized that the
system needs to be seen as based on principles for putting pieces together,
compositional meaning. You have your "jigsaw". OK in itself, but then a
jigsaw only goes together in one way. Is that "one way" significant?? I
want to emphasize that we can build all sorts of new things, lots of ways,
which will still be meaningful because built from meaningful relations. I
have emphasized substituting parts into each other as this principle for
building new things. This can be seen as a network operation so likely
possible in your formalism, but perhaps not emphasized.

I don't know if there is something particularly simple about the way I have
formulated the problem. How efficient is your "jigsaw" search? I was very
impressed when I discovered oscillations could resolve "cliques" of exactly
the kind my "substitutions" might be represented by. I don't know if you
saw the paper I linked for Ben earlier.

I could probably implement my relations in your codebase. But all I really
want from your code base is a network and speed. Oh, and actually a
mechanism to see if bits will synchronize, cliques, "sheaves"(?), will
project out and reveal themselves, when set oscillating. At the moment I
feel I'm more likely to get the speed using one of the simpler
neurosimulator inferfaces available publicly on SpiNNaker. And it doesn't
seem inconsequent that they happen to have an oscillation synchronization
mechanism too.

But perhaps the most efficient way to move forward is to hit the more
imaginative corners of the deep learning community with some really good
theoretical presentations of these issues of linearity and randomness or
contradiction of abstraction. Your presentations might have the rigour to
make them take notice. At least they might, if given prominence in
separation from your historical codebase, and presented in relation to the
relevant evidence, with a simple alternative.

-Rob

On Fri, Feb 22, 2019 at 8:56 AM Linas Vepstas <[email protected]>
wrote:

>
>
> On Thu, Feb 21, 2019 at 3:10 AM Rob Freeman <[email protected]>
> wrote:
>
>>
>> couldn't resist. I do know the frustration
>>
>
> I say things that hurt peoples feelings. Usually as a side-effect.
>
>
>> actually scored a couple of home runs with me.
>>
>
> Thank you!
>
>
>> He has identified linearities in vector models as a key weakness,
>> resolved the problem as one of reassembling parts (jigsaw pieces?), and
>> even dropped a mention of category theory, which speaks to formal
>> incompleteness. (Though I'm not sure he's thought that through, because it
>> actually becomes an argument for distributed representation and against
>> symbolism, against any fixed symbolism, anyway.)
>>
>
> How can I avoid writing a long email?  "Formal incompleteness" and
> "category theory" are two different things. When you say "formal
> incompleteness", I think that what you mean is "the deep learning people
> made a breakthrough, but they do not understand why it actually works". And
> that "they are not even trying to understand" or maybe "they are looking in
> the wrong place for that understanding".  And maybe that's true; I dunno.
> When something works, people rarely ask why.
>
> Category theory... has 120-year-old roots in "Universal algebra" and
> started life as a means of "formalization".  But it isn't that, any more;
> its evolved into something completely different. Lots simpler in some ways,
> but clashes with what you learned in school.  That clash leads to a mental
> block: like a fun-house mirror, its "obvious" once you get it, but it's
> perversely "hard to get".
>
> Here's what it is, today. It is a theory of dots an arrows, and a set of
> rules of how dots and arrows can be assembled. And a discussion of what you
> get when you assemble them. Some structures have a tiny handful of dots and
> arrows: zero, one, span, cospan, equalizer, pushout, pullback. Some have a
> countable infinity of dots and arrows: limit and colimit. Some have
> uncountable infinities of arrows going back and fourth: adjoint functors.
> There's more. As it happens, almost everything - literally - more or less
> everything in mathematics can be reduced to dots an arrows. Huh. Who would
> have guessed? Perhaps this smells like "formalization", depending on your
> background and experience. To me, (and I think to probably most working
> category theorists) its more like "gee look at the pretty pictures with
> dots and arrows" and "huh, what does that remind me of? Oh, right it
> reminds me of xyz". Back in the heyday, xyz was "the simplicial set" and
> stuff like that. It's less about "formalization", than it is "hey guys,
> look at the neat thing that happens when I do this!"
>
> Off-topic: 100 years ago (and even today) people thought that they could
> reduce all of mathematics to "just set theory", which sounds crazy cause
> sets are these absurdly trite, trivial things. Here, we just swap out sets
> and replace by trite, trivial dots and arrows. Whoopee! Despite the
> triteness of sets, the set theorists seem to have a lot on their hands. And
> very few of them actually work on "formalization". (whatever that might
> mean). Same deal for category theorists.  its trite in the same way that
> 1+1=2 is trite, and then one day you wake up and go "hang on... prime
> numbers! WTF!?!?!"
>
> The phrase "distributional semantics" keeps popping up in this
> conversation. Lets pick it apart. A "distribution" is that thing from
> probability: a bunch of balls distributed into bins.  Frequency of
> dice-rolls and gambling-card deals.  "Semantics" is "meaning", the study of
> "meaning".  Put together, what it means is that you can collect statistics
> on N-grams, for N=3 or N=5, and mash those statistics into K=100 or 200 or
> 300 bins (balls into bins) and then you notice that: "oh hey, check it!
> Each different distribution is kind of like the meaning of a word!  Oh wow:
> they're like .. additive -- linear, like  'King - Man + Woman = Queen'
> holds as an approximate equation between the distributions for the words
> King, Man, Woman, Queen. What's the other famous example?  'Berlin -
> Germany + France = Paris'. So this is the idea of "distributional
> semantics".
>
> I think if you resurrected a dead linguist from the 1950's, they'd say
> something like "no shit, sherlock, what else did you figure out while I was
> dead?"  The linguists always knew about distributional semantics, they just
> didn't have the compute power, the algorithms. The deep-learning guys have
> compute power and algorithms, but never really thought about linguistics
> before. Perhaps they think that linguistics is trite. You know, like 1+1=2
> is trite.
>
> For me, its just more like a playground, or a candy store. So I crack open
> a book on linguistics, and its got dots and arrows in it, and like oh,
> cool! That really really reminds me of a simplicial set, except that its a
> simplicial set with extra prongs pointing in other directions, so its not
> that, but a fun-house mirror of one. So then I read about neural nets, and
> they talk about vectors, except that they aren't really vectors (there's no
> rotational symmetry; its not Euclidean space, why the heck do the neural
> net people say "vector" when *obviously* they are not vectors?  Do they
> seriously think that 'Berlin - Germany + France = Paris' implies that
> distributions are vectors?) Anyway, distributions obey various axioms (and
> so do vectors), and those axioms are dot-and-arrow-like, and so there's yet
> more fun-n-games of "gee, lookit what happens when one does this with
> that".
>
> Mostly, I think one can go pretty darned far with just "plain old
> statistics", you just have to break out of the mind-set of N-grams and
> K=300-dimensional "vectors".  I don't even think its that hard to do. You
> just play the usual game of turning it sideways.
>
> Email too long, again. Sheesh.
>
> --linas
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M3d6a8389da8d115066b3156e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] openAI's AI advances and PR stunt...

Reply via email to