https://arxiv.org/pdf/1905.05950.pdf

See why would they even consider this.

Backpropagation in models with an objective loss tries to discover 
functions/patterns, so it can predict in ex. images better. It's an intelligent 
Brute Force because it can find so many rare patterns/functions very accurately 
ex. how bubbles work (physics rules) or that dogs usually bark than sleep.

But there's no such thing as intelligent Brute Force. You either do Brute Force 
or you do intelligent decisions. So I'm starting to think the backprop going on 
in these networks that is discovering complex functions/patterns is really just 
throwing away the unlikely things and merging together Probable things. If I 
like/predict AI, and cars, maybe AI+car = something that is more probably going 
to be something worth-while to try. That's why we merge DNA with humans, 
instead of just pure mutations. I personally think we should interbreed humans 
with birds because, we really need wings.

Complexity is built by small simple rules. Really everything is simple, there's 
just a few laws in physics. We make life confusing. But only after we learn the 
true simple ways first do we make life complex by building larger 
functions/patterns.

We want our net not predict by using a precise galaxy-sized physics simulation, 
but by using camera/etc "snapshots" of our universe and learn first the most 
basic rules (but not physics rules, other types of simple rules!).

Syntactics can capture ANY pattern in any universe that is not 100% random. 
It's the simplest pattern and builds other larger patterns. The closer our net 
gets to learning functions/patterns that closely resemble physics simulations 
the more we know we must be using this either sparingly or will not be learning 
it simply, as it's costly.

In a modern ANN network that learns by backpropagation etc, it is building 
larger functions/patterns (that are less general and more costly) by using 
smaller patterns/function nodes (usually the most Probable nodes). It is a BF 
search but it isn't totally BF/"backprop" because it's based on how favored the 
nodes are.

We can't build/learn higher layers yet until first backprop the lower layers. 
There's nothing up there really yet. Backprop is not lowering the loss by 
starting at the end of the net, it's doing it by building new functions out of 
smaller functions.

When we look at the GOFAI that made ex. PPM (Partial Prediction Match) or some 
similar advanced approach, we see syntactics, semantics ex. word2vec, recency, 
simple patterns like that. But these are what build all other 
patterns/functions. IF-THEN rules are syntactic, which runs our physics. 

So when we pick what nodes to merge, we choose Probable ones, only with some 
random Brute Force governing it. Backprop should not be the thing deciding 
which nodes merge their weights by X amount. We get less error when we merge 
nodes by X amount, but what we really need to know is which to merge and by how 
much amount.

Hinton said Backprop isn't brain-like / the way. We want to only install in AGI 
the most basic patterns/keys of the universe so it can learn by itself / get 
some accuracy for all the trillions of rare patterns which are the result of 
the elementary patterns! Backprop is a pattern that finds basic patterns? No, 
it can't learn similar words by contexts, it can only strengthen connections 
between them with luck unless it uses basic patterns "you" add to it to help it 
do so sooner. Backprop doesn't exist therefore as it's clueless, the hierarchy 
you give it is syntactics "already"!, it's only finding the weights on its own 
through a backwards approach. Burrows Wheeler Transform can do pretty good but 
this too is FREQUENCY, it stacks/merges same types but in "physical reality".
------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T17ed4e5aa955f6c2-M861e97bba23cfdd091881799
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to