***
I was wrong to start with embedding vectors as the base representation
for patterns.

I now think the way to implement it is not with vectors, but directly
in a network of observed language sequences.
***

This is what we're doing in our OpenCog based language learning project

But we're using vector-based predictive models (BERT-type models) to
guide the numerical weightings of symbolic language-patterns

***
The way I see it you should not "learn" anything beyond the raw data.
Certainly not fix or "learn" any "embeddings". Rather, when you want
to find a meaningful pattern (embedding or prediction "invariant"),
you project out latent invariants by clustering according to shared
contexts. They'll form little diamonds in the network.
***

Yeah, we are doing that

***
And likely the way to do this is to set the network oscillating, and
vary inhibition to get the resolution of "invariants" you want.
***

But we are not doing that.  Interesting...

On Mon, Feb 18, 2019 at 6:38 AM Rob Freeman <[email protected]> wrote:
>
> Feedback? To me?
>
> Any number of ways to break it. It's old now. 20 years back. And the data set 
> a few 10's of 1000's of words I scraped up from some websites back in the day.
>
> Just treat it as a proof of concept: you get (meaningful) hierarchy from 
> novel rearrangements of word vector "embeddings".
>
> The point is that novelty can still capture meaning. It doesn't have to be a 
> learned pattern.
>
> And actually learned patterns will always fail to capture the full detail of 
> patterns which can be generated. Learned patterns will always fail (not least 
> because you get contradictions, and you can't learn contradictions.)
>
> The greatest failing with it was that it still did not generate enough 
> novelty. So not too much novelty, but too little. It took me a long time to 
> realize this. As soon as I form a vector, an "embedding" in the model 
> expression, I've fixed a pattern. But that pattern too should be able to 
> change with context. The vectors are formed by grouping words which share 
> common contexts. But the problem is words can share some contexts and not 
> others. You should be able to find the shared contexts which matter at run 
> time. I generated new vectors by substituting vectors into each other, but 
> the vectors (embeddings) I started with, were already too fixed.
>
> I was wrong to start with embedding vectors as the base representation for 
> patterns.
>
> I now think the way to implement it is not with vectors, but directly in a 
> network of observed language sequences.
>
> The way I see it you should not "learn" anything beyond the raw data. 
> Certainly not fix or "learn" any "embeddings". Rather, when you want to find 
> a meaningful pattern (embedding or prediction "invariant"), you project out 
> latent invariants by clustering according to shared contexts. They'll form 
> little diamonds in the network.
>
> And likely the way to do this is to set the network oscillating, and vary 
> inhibition to get the resolution of "invariants" you want.
>
> -Rob
>
>
>
> On Mon, Feb 18, 2019 at 10:54 AM Stefan Reich via AGI <[email protected]> 
> wrote:
>> > demo.chaoticlanguage.com
>>
>> It works with "I went to Brazil", but seems to break with "In Brazil, people 
>> are friendly" (it creates "Brazil people" as a node). Any way to give it 
>> feedback?
>>
>> On Sun, 17 Feb 2019 at 22:48, Rob Freeman <[email protected]> wrote:
>>>
>>> On Mon, Feb 18, 2019 at 10:05 AM Stefan Reich via AGI 
>>> <[email protected]> wrote:
>>>>
>>>> Nothing wrong with pushing your own results if you consider them 
>>>> worthwhile...
>>>
>>>
>>> Well, I think on one level it's much the same as Pissanetzky.
>>>
>>> Pissanetzky's is a meaningful way of relating elements which generates new 
>>> patterns. You have new patterns all the time, but they are nevertheless 
>>> meaningful, because the relationships generating them are meaningful. So it 
>>> takes us away from the idea learning every pattern, which is what I believe 
>>> traps deep learning (and prevents Tesla from spotting firetrucks..., and 
>>> getting to that last mile self-driving.)
>>>
>>> Similarly I found new patterns, which were very much like Pissanetzky's 
>>> invariant permutations. But I did it for language. When I projected out 
>>> these new patterns of "invariants" for each new sentence, I found hierarchy.
>>>
>>> You can think of this as a next stage in a progression from symbolism to 
>>> distributed representation, now to novel but meaningful rearrangements of 
>>> distributed elements.
>>>
>>> Meanwhile deep learning just keeps pushing against a ceiling of what can be 
>>> learned.
>>>
>>> FWIW you can see an old and simple demo of the principle of hierarchy 
>>> coming out of novel rearrangements (of embeddings) at:
>>>
>>> demo.chaoticlanguage.com
>>>
>>> Summary paper circa 2014 at:
>>>
>>> Parsing using a grammar of word association vectors
>>> http://arxiv.org/abs/1403.2152
>>>
>>> -Rob
>>
>>
>>
>> --
>> Stefan Reich
>> BotCompany.de // Java-based operating systems
>
> Artificial General Intelligence List / AGI / see discussions + participants + 
> delivery options Permalink



-- 
Ben Goertzel, PhD
http://goertzel.org

"Listen: This world is the lunatic's sphere,  /  Don't always agree
it's real.  /  Even with my feet upon it / And the postman knowing my
door / My address is somewhere else." -- Hafiz

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T581199cf280badd7-M36be845e93d888c32f743db4
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to