Re: [agi] Lamarck Lives!(?)

Matt Mahoney Fri, 05 Dec 2008 17:16:39 -0800

--- On Wed, 12/3/08, Ben Goertzel <[EMAIL PROTECTED]> wrote:

> Well, LTP is definitely real ... and I'm quite sure the scheme you
> describe is *not* how learning works in the brain ;-) ,,,
> but I'm equally sure that the full story has not yet been
> uncovered...


I have attached a program that illustrates how memory can be stored in neurons 
rather than synapses. In a feed-forward configuration, each neuron is randomly 
connected to others further back, rather than fully connected. The network is 
sparse, with around n^(1/2) connections each among n neurons so that there is 
usually a path between each input and output going through at most one or two 
intermediate neurons. It also differs from a normal network in that all of the 
output weights of each neuron are constrained to have the same value. In other 
words, the activation level x[i] of the i'th neuron is given by

  x[i] = w[i]/(1+exp(-SUM_j x[j]))

where j ranges over the input neurons for x[i]. Note that there is only one 
weight w[i] per neuron, rather than one per synapse.

For the output neurons, w[i] = 1. For all others, the network is trained by 
adjusting the weights to reduce the RMS output error. This can be done in many 
ways. For example, you could simulate reinforcement learning by making random 
changes to w[i] and keeping changes that are followed by a reward computed by 
comparing desired and actual outputs.

I used a more efficient training method, although I kept equivalence to 
biologically plausible models in mind. I used the following method: select one 
neuron x[i] at random and calculate the network outputs for w[i], w[i]+d, and 
w[i]-d for some large delta d like 0.5. Calculate the sum of the squares of the 
errors (actual - desired)^2 over all the values in the domain of the objective 
function (the desired behavior). If w[i]+d or w[i]-d gives a smaller total 
error than w[i], then make that the new weight. Otherwise replace d with d/2 
and try again. When d is sufficiently small (like 0.02), then take the best 
weight and stop. Repeat until the errors are small enough.

Neuron centered memory has the same information theoretical constraints as 
normal Hebbian learning. If your objective function has n bits of complexity, 
you need at least n neurons (rather than n synapses), because each parameter 
stores about 1 bit. Also, because each training session communicates about 1 
bit, you need at least n sessions, actually O(n log n) to remove the last bit 
error assuming exponential convergence.

In my program, I demonstrate training a neural network to learn a 3 by 3 bit 
multiplier. In general, a function with NX inputs and NY outputs has NY * 2^NX 
bits of complexity. For the 3x3 multiplier, NX = NY = 6 = 384 bits. In 
practice, you need 1.5 to 2 times as much. I used 640 neurons including 6 input 
and 6 outputs, and 36 random connections per neuron. Training should therefore 
require 640 * log(640) ~ 6000 sessions, although in practice about 15,000 were 
needed. The total number of operations is 15000 * 64 * 640 * 36 = 2.2 x 10^10, 
which took about 20 minutes on my PC. I estimate training a 4x4 multiplier 
would take about 40 times as long.

I do not recommend using neuron-centered networks for solving problems that 
could be solved using Hebbian learning. This approach is slower by a factor of 
O(n^(1/2)), in this case 36, not to mention the difficulty of simulating sparse 
networks on vector processors. The purpose of this program is to show the 
plausibility of neuron-centered memory in the human brain. The brain would not 
be affected by the speed penalty because synapse operations are parallel. 
Furthermore, the model explains most of the discrepancy between Landauer's 
estimate of 10^9 bits of long term memory and 10^15 synapses. There are 10^11 
neurons, a much closer number.

Also, this model does not preclude Hebbian learning. Both could occur 
simultaneously. After all, learning is really a simple idea. The brain is 
adaptive. You just fiddle with knobs until you get the desired result. You 
don't have to understand what the knobs do. I believe you could achieve 
learning in sparse networks by fiddling with just about any neuron-wide 
parameters.

-- Matt Mahoney, [EMAIL PROTECTED]




-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=120640061-aded06
Powered by Listbox: http://www.listbox.com

nn.cpp
Description: Binary data

Re: [agi] Lamarck Lives!(?)

Reply via email to