Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Steve Richfield Thu, 25 Dec 2008 18:58:56 -0800

Richard,

On 12/25/08, Richard Loosemore <[email protected]> wrote:

> Steve Richfield wrote:
>
>> Ben, et al,
>>  After ~5 months of delay for theoretical work, here are the basic ideas
>> as to how really fast and efficient automatic learning could be made almost
>> trivial. I decided NOT to post the paper (yet), but rather, to just discuss
>> the some of the underlying ideas in AGI-friendly terms.
>>  Suppose for a moment that a NN or AGI program (they can be easily mapped
>> from one form to the other
>>
>
> ... this is not obvious, to say the least.  Mapping involves many
> compromises that change the functioning of each type ...

There are doubtless exceptions to my broad statement, but generally, neuron
functionality is WIDE open to be pretty much ANYTHING you choose, including
that of an AGI engine's functionality on its equations.

In the reverse, any NN could be expressed in a shorthand form that contains
structure, synapse functions, etc., and an AGI engine could be
built/modified to function according to that shorthand.

In short, mapping between NN and AGI forms presumes flexibility in the
functionality of the target form. Where that flexibility is NOT present,
e.g. because of orthogonal structure, etc., then you must ask whether
something is being gained or lost by the difference. Clearly, any transition
that involves a loss should be carefully examined to see if the entire
effort is headed in the wrong direction, which I think was your original
point here.

> ), instead of operating on "objects" (in an
>
>> object-oriented sense)
>>
>
> Neither NN nor AGI has any intrinsic relationship to OO.

Clearly I need a better term here. Both NNs and AGIs tend to have neurons or
equations that reflect the presence (or absence) of various objects,
conditions, actions, etc. My fundamental assertion is that if you
differentiate the inputs so that everything in the entire network reflects
dp/dt instead of straight probabilities, then the network works identically,
but learning is GREATLY simplified.

> , instead, operates on the rate-of-changes in the
>
>> probabilities of "objects", or dp/dt. Presuming sufficient bandwidth to
>> generally avoid superstitious coincidences, fast unsupervised learning then
>> becomes completely trivial, as like objects cause simultaneous
>> like-patterned changes in the inputs WITHOUT the overlapping effects of the
>> many other objects typically present in the input (with numerous minor
>> exceptions).
>>
>
> You have already presumed that something supplies the system with "objects"
> that are meaningful.  Even before your first mention of dp/dt, there has to
> be a mechanism that is so good that it never invents objects such as:
>
> Object A:  "A person who once watched all of Tuesday Welds movies in the
> space of one week" or
>
> Object B:  "Something that is a combination of Julius Caesar's pinky toe
> and a sour grape that Brutus' just spat out" or
>
> Object C:  "All of the molecules involved in a swiming gala that happen to
> be 17.36 meters from the last drop of water that splashed from the pool".
>
> You have supplied no mechanism that is able to do that, but that mechanism
> is 90% of the trouble, if learning is what you are about.

With prior unsupervised learning you are 100% correct. However none of the
examples you gave involved temporal simultaneity. I will discuss B above
because it is close enough to be interesting.

If indeed someone just began to notice something interesting about Caesar's
pinkie toe *as* they just began to notice the taste of a sour grape, then
yes, that probably would be leaned via the mechanisms I am talking about.
However, if one was "present perfect tense" while the other was just
beginning, then it wouldn't with my approach but would with prior
unsupervised learning methods. For example, Caesar's pinkie toe had been
noticed and examined, then before the condition passed they tasted a sour
grape, then temporal simultaneity of the dp/dt edges wouldn't exist to learn
from. Of course, in both cases, the transforms would work identically given
identical prior learning/programming.

> Instead, you waved your hands and said "fast unsupervised learning
> > then becomes completely trivial" .... this statement is a declaration
> that a good mechanism is available.
>
> You then also talk about "like" objects.  But the whole concept of "like"
> is extraordinarily troublesome.  Are Julius Caesar and Brutus "like" each
> other?  Seen from our distance, maybe yes, but from the point of view of
> Julius C., probably not so much.  Is a G-type star "like" a mirror?  I don't
> know any stellar astrophysicists who would say so, but then again OF COURSE
> they are, because they are almost indistinguishable, because if you hold a
> mirror up in the right way it can reflect the sun and the two visual images
> can be identical.
>
> These questions can be resolved, sure enough, but it is the whole business
> of resolving these questions (rather than waving a hand over them and
> declaring them to be trivial) that is the point.

I think that pretty much everyone everyone who has "dented their pick" on
unsupervised learning (this includes myself. Does anyone else here
have these same scars?) has developed methods that would work on "completely
obvious" test cases but failed miserably on real-world input. My point here
is that looking at things from a dp/dt point of view, real-world situations
become about as simple as "completely obvious" test cases.

I would quote some good source to make this point, but I don't think anyone
has gone here yet.

To continue this effort (as I plan to do) requires optimally solving the PCA
problem, though I do NOT think that this is necessary to build good and
useful NN/AGI systems. I suspect another "trap" in the concept of PCA.
consider the following from my unposted paper:

 *

principal component analysis* *:* A mathematical procedure that transforms a
number of variables into a smaller number of less correlated variables
called *principal components*. The first principal component accounts for as
much of the variability in the data as possible, and each succeeding
component accounts for as much of the remaining variability as possible. The
pure mathematical form of this produces a minimal number of uncorrelated
variables without regard to real-world significance, while a more useful
form produces output variables that have real-world correspondence.
Apparently, real-world PCAs typically combine features in a Huffman-like
coding, that can be easily split back apart with simple combinatorial
Bayesian logic. This could most simply be implemented as an OR of the ANDs
of the features needed for each of the components, which in d(ln p)/dt
space appear to be exactly what dendritic trees accomplish.

Is this an unavoidable step? Is this a desirable step? Can the features be
directly identified in an un-combined way? The answer to all of these
questions may be YES, for if a leg of a dendritic tree extracts a feature,
then it is obviously possible (and perhaps even necessary) to extract
features separately from one another. Arbitrarily combining them to produce
principal components may lose nothing, because downstream neurons can always
separate features from components as needed.

My present challenge is being a "mouse in a maze" of matrix notation and
trying to see the forest for the trees. Clearly, my predecessors also were
challenged in this way, so I am trying to go around the really hard problems
(that evaded the best mathematicians for a century) to see what is REALLY
needed, and abandon before starting work on all other areas.

>  But, what would Bayesian equations or NN neuron functionality look like in
>> dp/dt space? NO DIFFERENCE (math upon request). You could trivially
>> differentiate the inputs to a vast and complex existing AGI or NN, integrate
>> the outputs, and it would perform _identically_ (except for some "little"
>> details discussed below). Of course, while the transforms would be
>> identical, unsupervised learning would be quite a different matter, as now
>> the nearly-impossible becomes trivially simple.
>>  For some things (like short-term memory) you NEED an integrated
>> object-oriented result. Very simple - just integrate the signal. How about
>> muscle movements? Note that muscle actuation typically causes acceleration,
>> which doubly integrates the driving signal, which would require yet another
>> differentiation of a differentiated signal to, when doubly integrated by the
>> mechanical system, produce movement to the desired location.
>>  Note that once input values are stored in a matrix for processing, the
>> baby has already been thrown out with the bathwater. You must START with
>> differentiated input values and NOT static measured values. THIS is what the
>> PCA folks have been missing in their century-long quest for an efficient
>> algorithm to identify principal components, as their arrays had already
>> discarded exactly what they needed. Of course you could simply subtract
>> successive samples from one another - at some considerable risk, since you
>> are now sampling at only half the Nyquist-required speed to make your AGI/NN
>> run at its intended speed. In short, if inputs are not being electronically
>> differentiated, then sampling must proceed at least twice as fast as the
>> NN/AGI cycles.
>>  But - how about the countless lost constants of integration? They "all
>> come out in the wash" - except for where actual integration at the outputs
>> is needed. Then, clippers and leaky integrators, techniques common to
>> electrical engineering, will work fine and produce many of the same
>> artifacts (like visual extinction) seen in natural systems.
>>  It all sounds SO simple, but I couldn't find any prior work in this
>> direction using Google. However, the collective memory of this group is
>> pretty good, so perhaps someone here knows of some prior effort that did
>> something like this. I would sure like to put SOMETHING in the "References"
>> section of my paper.
>>  Loosemore: THIS is what I was talking about when I explained that there
>> is absolutely NO WAY to understand a complex system through direct
>> observation, except by its useless anomalies. By shifting an entire AGI or
>> NN to operate on derivatives instead of object values, it works *almost*
>> (the operative word in this statement) exactly the same as one working in
>> object-oriented space, only learning is transformed from the
>> nearly-impossible to the trivially simple. Do YOU see any observation-based
>> way to tell how we are operating behind our eyeballs, object-oriented or
>> dp/dt? While there are certainly other explanations for visual extinction,
>> this is the only one that I know of that is absolutely impossible to
>> engineer around. No one has (yet) proposed any value to visual extinction,
>> and it is a real problem for hunters, so if it were avoidable, then I
>> suspect that ~200 million years of evolution would have eliminated it long
>> ago.
>>
>
> Read David Marr's book "Vision",

THANKS for the reference.

or any other text that discusses the low level work done by the visual
> system.  There are indeed differentiation functions in there (IIRC, Marr
> came up with the Difference of Gaussians (DOG) idea because the difference
> of Gaussians was a way to do the equivalent of dp/dt).  BUT... this is all
> in the first few wires coming out of the retina!

YES - exactly where it would be needed to make the ENTIRE system work in
dp/dt space.

The following comments reflects a poor choice of paradigm. In short, your
comments are not so much incorrect as failing to lead to a useful
conclusion.

> It is not interesting.

Whadaya mean not interesting - then converts a significant part of the brain
to operate in dp/dt space. That it accomplishes this SO simply is VERY
interesting.

The BIG issue is that the lab guys are no sort of mathematicians. They don't
understand how simple functions at the exteriors of a computational process
can completely change the internal representation throughout the process,
with vast ramifications INCLUDING the need to completely rethink what
neuronal activity really means.

> Visual extinction (of the sort you are talking about) is all over and done
> with in the first few cells of the visual pathway, whereas you are talking
> here about the millions of other processes that occur higher up.

All such conclusions are wrong UNLESS they allow for dp/dt operation, which
they clearly haven't. You would have to integrate a neuron's output and see
extinction in the integral to make such a conclusion. My point, made in a
previously unposted part of my article that addresses NOT periodically
restoring signals to "object" form is...

"The most perfect "correction" is avoiding the problem that requires a
correction. With no integration, there is no need for any mechanism to
estimate the constant of integration. However, the information is still
missing in the rate-of-change input, so the process would necessary
introduce whatever artifacts exist in a "perfect" correction as outlined
above. In short, QED, there will be extinguishment, instant recovery, and
any other artifacts of methods that perfect engineers might discover in
coming centuries. There is no need to show exactly how this happens, because
we know that it absolutely must happen because obviated processes are by
their nature as perfect as perfect can be. Hence, for now this remains an
interesting but needless exercise for some future mathematician."

> As for your comment about complex systems, it looks like a nonsequiteur.
>  Just does not follow, as far as I can see.

Just take our present discussion, where failing to see that things may be
operating entirely in dp/dt space leads to virtually ignoring the essential
pieces (differentiation at the input) and then simply dismissing visual
extinction as just the way that the system works. Turing indirectly pointed
out that there are a limitless number of ways of building ANY system with a
given functionality, so why should any (sane) person think that you can see
how any hidden system works by observing its operation? Someone is wrong
here - either you or Turing. The whole idea of understanding any black box
by observing its external functionality is a fool's errand UNLESS you have
some really major clues (like a window on its operation). Unfortunately, we
just aren't there yet with the brain.

My theory is that there is a threshold of mathematical understanding, from
which the remainder can be inferred. The sad states of NN and AGI shows me
that we are NOT yet there. Is dp/dt representation and a few other things
sufficient? Only time will tell.

BTW, it is really d(ln p)/dt, but that is another story. Let's first get
past dp/dt.

Steve Richfield

-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to