*** I was wrong to start with embedding vectors as the base representation for patterns.
I now think the way to implement it is not with vectors, but directly in a network of observed language sequences. *** This is what we're doing in our OpenCog based language learning project But we're using vector-based predictive models (BERT-type models) to guide the numerical weightings of symbolic language-patterns *** The way I see it you should not "learn" anything beyond the raw data. Certainly not fix or "learn" any "embeddings". Rather, when you want to find a meaningful pattern (embedding or prediction "invariant"), you project out latent invariants by clustering according to shared contexts. They'll form little diamonds in the network. *** Yeah, we are doing that *** And likely the way to do this is to set the network oscillating, and vary inhibition to get the resolution of "invariants" you want. *** But we are not doing that. Interesting... On Mon, Feb 18, 2019 at 6:38 AM Rob Freeman <[email protected]> wrote: > > Feedback? To me? > > Any number of ways to break it. It's old now. 20 years back. And the data set > a few 10's of 1000's of words I scraped up from some websites back in the day. > > Just treat it as a proof of concept: you get (meaningful) hierarchy from > novel rearrangements of word vector "embeddings". > > The point is that novelty can still capture meaning. It doesn't have to be a > learned pattern. > > And actually learned patterns will always fail to capture the full detail of > patterns which can be generated. Learned patterns will always fail (not least > because you get contradictions, and you can't learn contradictions.) > > The greatest failing with it was that it still did not generate enough > novelty. So not too much novelty, but too little. It took me a long time to > realize this. As soon as I form a vector, an "embedding" in the model > expression, I've fixed a pattern. But that pattern too should be able to > change with context. The vectors are formed by grouping words which share > common contexts. But the problem is words can share some contexts and not > others. You should be able to find the shared contexts which matter at run > time. I generated new vectors by substituting vectors into each other, but > the vectors (embeddings) I started with, were already too fixed. > > I was wrong to start with embedding vectors as the base representation for > patterns. > > I now think the way to implement it is not with vectors, but directly in a > network of observed language sequences. > > The way I see it you should not "learn" anything beyond the raw data. > Certainly not fix or "learn" any "embeddings". Rather, when you want to find > a meaningful pattern (embedding or prediction "invariant"), you project out > latent invariants by clustering according to shared contexts. They'll form > little diamonds in the network. > > And likely the way to do this is to set the network oscillating, and vary > inhibition to get the resolution of "invariants" you want. > > -Rob > > > > On Mon, Feb 18, 2019 at 10:54 AM Stefan Reich via AGI <[email protected]> > wrote: >> > demo.chaoticlanguage.com >> >> It works with "I went to Brazil", but seems to break with "In Brazil, people >> are friendly" (it creates "Brazil people" as a node). Any way to give it >> feedback? >> >> On Sun, 17 Feb 2019 at 22:48, Rob Freeman <[email protected]> wrote: >>> >>> On Mon, Feb 18, 2019 at 10:05 AM Stefan Reich via AGI >>> <[email protected]> wrote: >>>> >>>> Nothing wrong with pushing your own results if you consider them >>>> worthwhile... >>> >>> >>> Well, I think on one level it's much the same as Pissanetzky. >>> >>> Pissanetzky's is a meaningful way of relating elements which generates new >>> patterns. You have new patterns all the time, but they are nevertheless >>> meaningful, because the relationships generating them are meaningful. So it >>> takes us away from the idea learning every pattern, which is what I believe >>> traps deep learning (and prevents Tesla from spotting firetrucks..., and >>> getting to that last mile self-driving.) >>> >>> Similarly I found new patterns, which were very much like Pissanetzky's >>> invariant permutations. But I did it for language. When I projected out >>> these new patterns of "invariants" for each new sentence, I found hierarchy. >>> >>> You can think of this as a next stage in a progression from symbolism to >>> distributed representation, now to novel but meaningful rearrangements of >>> distributed elements. >>> >>> Meanwhile deep learning just keeps pushing against a ceiling of what can be >>> learned. >>> >>> FWIW you can see an old and simple demo of the principle of hierarchy >>> coming out of novel rearrangements (of embeddings) at: >>> >>> demo.chaoticlanguage.com >>> >>> Summary paper circa 2014 at: >>> >>> Parsing using a grammar of word association vectors >>> http://arxiv.org/abs/1403.2152 >>> >>> -Rob >> >> >> >> -- >> Stefan Reich >> BotCompany.de // Java-based operating systems > > Artificial General Intelligence List / AGI / see discussions + participants + > delivery options Permalink -- Ben Goertzel, PhD http://goertzel.org "Listen: This world is the lunatic's sphere, / Don't always agree it's real. / Even with my feet upon it / And the postman knowing my door / My address is somewhere else." -- Hafiz ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T581199cf280badd7-M36be845e93d888c32f743db4 Delivery options: https://agi.topicbox.com/groups/agi/subscription
