Steve Richfield wrote:
Richard,
On 12/25/08, *Richard Loosemore* <r...@lightlink.com
<mailto:r...@lightlink.com>> wrote:
Steve Richfield wrote:
Ben, et al,
After ~5 months of delay for theoretical work, here are the
basic ideas as to how really fast and efficient automatic
learning could be made almost trivial. I decided NOT to post the
paper (yet), but rather, to just discuss the some of the
underlying ideas in AGI-friendly terms.
Suppose for a moment that a NN or AGI program (they can be
easily mapped from one form to the other
... this is not obvious, to say the least. Mapping involves many
compromises that change the functioning of each type ...
There are doubtless exceptions to my broad statement, but generally,
neuron functionality is WIDE open to be pretty much ANYTHING you choose,
including that of an AGI engine's functionality on its equations.
In the reverse, any NN could be expressed in a shorthand form that
contains structure, synapse functions, etc., and an AGI engine could be
built/modified to function according to that shorthand.
In short, mapping between NN and AGI forms presumes flexibility in the
functionality of the target form. Where that flexibility is NOT present,
e.g. because of orthogonal structure, etc., then you must ask whether
something is being gained or lost by the difference. Clearly, any
transition that involves a loss should be carefully examined to see if
the entire effort is headed in the wrong direction, which I think was
your original point here.
There is a problem here.
When someone says "X and Y can easily be mapped from one form to the
other" there is an implication that they are NOt suggesting that we go
right down to the basic constituents of both X and Y in order to effect
the mapping.
Thus: "Chalk and Cheese can easily be mapped from one to the other"
.... trivially true if we are prepared to go down to the common
denominator of electrons, protons and neutrons. But if we stay at a
sensible level then, no, these do not map onto one another.
Similarly, if you claim that NN and regular AGI map onto one another, I
assume that you are saying something more substantial than that these
two can both be broken down into their primitive computational parts,
and that when this is done they seem equivalent.
NN and regular AGI, they way they are understood by people who
understand them, have very different styles of constructing intelligent
systems. Sure, you can code both in C, or Lisp, or Cobol, but that is
to trash the real meaning of "are easily mapped onto one another".
), instead of operating on "objects" (in an
object-oriented sense)
Neither NN nor AGI has any intrinsic relationship to OO.
Clearly I need a better term here. Both NNs and AGIs tend to have
neurons or equations that reflect the presence (or absence) of various
objects, conditions, actions, etc. My fundamental assertion is that if
you differentiate the inputs so that everything in the entire network
reflects dp/dt instead of straight probabilities, then the network works
identically, but learning is GREATLY simplified.
Seems like a simple misunderstanding: you were not aware that "object
oriented" does not mean the same as saying that there are fundamental
atomic constituents of a representation.
, instead, operates on the rate-of-changes in the
probabilities of "objects", or dp/dt. Presuming sufficient
bandwidth to generally avoid superstitious coincidences, fast
unsupervised learning then becomes completely trivial, as like
objects cause simultaneous like-patterned changes in the inputs
WITHOUT the overlapping effects of the many other objects
typically present in the input (with numerous minor exceptions).
You have already presumed that something supplies the system with
"objects" that are meaningful. Even before your first mention of
dp/dt, there has to be a mechanism that is so good that it never
invents objects such as:
Object A: "A person who once watched all of Tuesday Welds movies in
the space of one week" or
Object B: "Something that is a combination of Julius Caesar's pinky
toe and a sour grape that Brutus' just spat out" or
Object C: "All of the molecules involved in a swiming gala that
happen to be 17.36 meters from the last drop of water that splashed
from the pool".
You have supplied no mechanism that is able to do that, but that
mechanism is 90% of the trouble, if learning is what you are about.
With prior unsupervised learning you are 100% correct. However none of
the examples you gave involved temporal simultaneity. I will discuss B
above because it is close enough to be interesting.
If indeed someone just began to notice something interesting about
Caesar's pinkie toe *_as_* they just began to notice the taste of a sour
grape, then yes, that probably would be leaned via the mechanisms I am
talking about. However, if one was "present perfect tense" while the
other was just beginning, then it wouldn't with my approach but would
with prior unsupervised learning methods. For example, Caesar's pinkie
toe had been noticed and examined, then before the condition passed they
tasted a sour grape, then temporal simultaneity of the dp/dt edges
wouldn't exist to learn from. Of course, in both cases, the transforms
would work identically given identical prior learning/programming.
You have not understood the sense in which I made the point, I fear.
I was describing obviously useless concepts. Ones where there is no
temporal simultaneity. Concepts thrown together out of completely
useless components.
The question is: how to build a mechanism that does NOT fall into the
trap of creating such nonsense-concepts. If you just say "assume that
we have such a concept builder" you beg a million questions.
Your reply, above, took one of my examples and tried to talk about what
could happen if it was not, after all, a nonsense-concept.
Alas, that is neither here nor there, because (sure enough) *everyone*
agrees that temporal simultaneity is a good basic ground for trying to
construct new concepts (it is the Reason Number One for creating a new
concept!). But we also know that just common or garden variety Temporal
Simultaneity doesn't get you very far .... that is the easiest of all
mechanisms, and we need a hundred more concept-building mechanisms that
are better than that before we have a real concept-generating engine.
And (here is where my point comes back into the picture) if anyone
stands up and says "Hey everyone! I have discovered a hundred concept
building mechanisms that I think will do the trick!", the first question
that the crowd will ask is: "Do your mechanisms work together to build
real, sensible concepts, or do they fill the system with bazillions of
really dumb, useless concepts (like my nonsense list above)?"
Anyone who says that they know of a way to get unsupervised learning to
occur is saying, implicitly, that they have those 100 concept building
mechanisms ready to go (or one super mechanism as good as all of them).
Hence my original point: you cannot simply imply that your system is
working with bona-fide, coherent concepts unless you can show that it
really does come up with concepts (or objects) that are sensible.
FWIW, I would level the same criticism against quite a few other people,
so you don't stand alone here.
(Just briefly: if I move on to look at your actual reply above, I see
also mention of rates of change (dp/dt), but no explanation of how rates
of change of anything would help a system build a concept that is a
combination (NOT an association, please!) of [Julius Caesar's pinky toe
and a sour grape that Brutus' just spat out]. The rates of change seem
irrelevant here).
Instead, you waved your hands and said "fast unsupervised learning
> then becomes completely trivial" .... this statement is a
declaration that a good mechanism is available.
You then also talk about "like" objects. But the whole concept of
"like" is extraordinarily troublesome. Are Julius Caesar and Brutus
"like" each other? Seen from our distance, maybe yes, but from the
point of view of Julius C., probably not so much. Is a G-type star
"like" a mirror? I don't know any stellar astrophysicists who would
say so, but then again OF COURSE they are, because they are almost
indistinguishable, because if you hold a mirror up in the right way
it can reflect the sun and the two visual images can be identical.
These questions can be resolved, sure enough, but it is the whole
business of resolving these questions (rather than waving a hand
over them and declaring them to be trivial) that is the point.
I think that pretty much everyone everyone who has "dented their pick"
on unsupervised learning (this includes myself. Does anyone else here
have these same scars?) has developed methods that would work on
"completely obvious" test cases but failed miserably on real-world
input. My point here is that looking at things from a dp/dt point of
view, real-world situations become about as simple as "completely
obvious" test cases.
I would quote some good source to make this point, but I don't think
anyone has gone here yet.
But Steve, if YOU claim that "looking at things from a dp/dt point of
view" does in fact yield a dramatic breakthrough that allows
unsupervised learning to work on real world cases (something nobody else
can do right now), then YOU are expected to be the one who has gone
there, done it, and come back with evidence that your idea does in fact
do that.
If you don't have a clear demonstration that this dp/dt idea does
deliver the goods, why are you claiming that it does? Surely it is one
or the other?
To continue this effort (as I plan to do) requires optimally solving the
PCA problem, though I do NOT think that this is necessary to build good
and useful NN/AGI systems. I suspect another "trap" in the concept of
PCA. consider the following from my unposted paper:
*
principal component analysis
* *:* A mathematical procedure that transforms a number of variables
into a smaller number of less correlated variables called /principal
components/. The first principal component accounts for as much of the
variability in the data as possible, and each succeeding component
accounts for as much of the remaining variability as possible. The pure
mathematical form of this produces a minimal number of uncorrelated
variables without regard to real-world significance, while a more useful
form produces output variables that have real-world correspondence.
Apparently, real-world PCAs typically combine features in a Huffman-like
coding, that can be easily split back apart with simple combinatorial
Bayesian logic. This could most simply be implemented as an OR of the
ANDs of the features needed for each of the components, which in d(ln
p)/dt space appear to be exactly what dendritic trees accomplish.
Is this an unavoidable step? Is this a desirable step? Can the features
be directly identified in an un-combined way? The answer to all of these
questions may be YES, for if a leg of a dendritic tree extracts a
feature, then it is obviously possible (and perhaps even necessary) to
extract features separately from one another. Arbitrarily combining them
to produce principal components may lose nothing, because downstream
neurons can always separate features from components as needed.
My present challenge is being a "mouse in a maze" of matrix notation and
trying to see the forest for the trees. Clearly, my predecessors also
were challenged in this way, so I am trying to go around the really hard
problems (that evaded the best mathematicians for a century) to see what
is REALLY needed, and abandon before starting work on all other areas.
But, what would Bayesian equations or NN neuron functionality
look like in dp/dt space? NO DIFFERENCE (math upon request). You
could trivially differentiate the inputs to a vast and complex
existing AGI or NN, integrate the outputs, and it would perform
_identically_ (except for some "little" details discussed
below). Of course, while the transforms would be identical,
unsupervised learning would be quite a different matter, as now
the nearly-impossible becomes trivially simple.
For some things (like short-term memory) you NEED an integrated
object-oriented result. Very simple - just integrate the signal.
How about muscle movements? Note that muscle actuation typically
causes acceleration, which doubly integrates the driving signal,
which would require yet another differentiation of a
differentiated signal to, when doubly integrated by the
mechanical system, produce movement to the desired location.
Note that once input values are stored in a matrix for
processing, the baby has already been thrown out with the
bathwater. You must START with differentiated input values and
NOT static measured values. THIS is what the PCA folks have been
missing in their century-long quest for an efficient algorithm
to identify principal components, as their arrays had already
discarded exactly what they needed. Of course you could simply
subtract successive samples from one another - at some
considerable risk, since you are now sampling at only half the
Nyquist-required speed to make your AGI/NN run at its intended
speed. In short, if inputs are not being electronically
differentiated, then sampling must proceed at least twice as
fast as the NN/AGI cycles.
But - how about the countless lost constants of integration?
They "all come out in the wash" - except for where actual
integration at the outputs is needed. Then, clippers and leaky
integrators, techniques common to electrical engineering, will
work fine and produce many of the same artifacts (like visual
extinction) seen in natural systems.
It all sounds SO simple, but I couldn't find any prior work in
this direction using Google. However, the collective memory of
this group is pretty good, so perhaps someone here knows of some
prior effort that did something like this. I would sure like to
put SOMETHING in the "References" section of my paper.
Loosemore: THIS is what I was talking about when I explained
that there is absolutely NO WAY to understand a complex system
through direct observation, except by its useless anomalies. By
shifting an entire AGI or NN to operate on derivatives instead
of object values, it works *almost* (the operative word in this
statement) exactly the same as one working in object-oriented
space, only learning is transformed from the nearly-impossible
to the trivially simple. Do YOU see any observation-based way to
tell how we are operating behind our eyeballs, object-oriented
or dp/dt? While there are certainly other explanations for
visual extinction, this is the only one that I know of that is
absolutely impossible to engineer around. No one has (yet)
proposed any value to visual extinction, and it is a real
problem for hunters, so if it were avoidable, then I suspect
that ~200 million years of evolution would have eliminated it
long ago.
Read David Marr's book "Vision",
THANKS for the reference.
or any other text that discusses the low level work done by the
visual system. There are indeed differentiation functions in there
(IIRC, Marr came up with the Difference of Gaussians (DOG) idea
because the difference of Gaussians was a way to do the equivalent
of dp/dt). BUT... this is all in the first few wires coming out of
the retina!
YES - exactly where it would be needed to make the ENTIRE system work in
dp/dt space.
Not at all. You *must* read the stuff before jumping to conclusions:
the DOG functions deliver information, but the sum total of all
information delivered is not just derivatives from then on. The entire
system does not work in dp/dt space: we have abundant evidence of
systems responding to features being present, NOT just the rates of
change of features.
The following comments reflects a poor choice of paradigm. In short,
your comments are not so much incorrect as failing to lead to a useful
conclusion.
It is not interesting.
Whadaya mean not interesting - then converts a significant part of the
brain to operate in dp/dt space. That it accomplishes this SO simply is
VERY interesting.
Ditto my above comment. If you had read Marr's book, or the raft of
other cog sci books that are pertinent, you would notice that it is
false to say (as you do in the above paragraph) that a significant part
of the brain is using only rates of change of things.
The BIG issue is that the lab guys are no sort of mathematicians. They
don't understand how simple functions at the exteriors of a
computational process can completely change the internal representation
throughout the process, with vast ramifications INCLUDING the need to
completely rethink what neuronal activity really means.
Methinks the pot calleth the kettle black here.
Those lab guys are usually pretty good at math. I am no slouch myself.
And I know (as they do) that the computation of a few derivatives on
the periphery does not slew the whole system in such a way that all it
does from then on is work with dp/dt (or any other differential).
Visual extinction (of the sort you are talking about) is all over
and done with in the first few cells of the visual pathway, whereas
you are talking here about the millions of other processes that
occur higher up.
All such conclusions are wrong UNLESS they allow for dp/dt operation,
which they clearly haven't. You would have to integrate a neuron's
output and see extinction in the integral to make such a conclusion. My
point, made in a previously unposted part of my article that addresses
NOT periodically restoring signals to "object" form is...
"The most perfect "correction" is avoiding the problem that requires a
correction. With no integration, there is no need for any mechanism to
estimate the constant of integration. However, the information is still
missing in the rate-of-change input, so the process would necessary
introduce whatever artifacts exist in a "perfect" correction as outlined
above. In short, QED, there will be extinguishment, instant recovery,
and any other artifacts of methods that perfect engineers might discover
in coming centuries. There is no need to show exactly how this happens,
because we know that it absolutely must happen because obviated
processes are by their nature as perfect as perfect can be. Hence, for
now this remains an interesting but needless exercise for some future
mathematician."
Here I must defer to Vladimir's earlier conclusion: this is meaningless
verbiage. Not coherent. Word salad. Sorry, but there comes a time for
calling a spade a spade.
As for your comment about complex systems, it looks like a
nonsequiteur. Just does not follow, as far as I can see.
Just take our present discussion, where failing to see that things may
be operating entirely in dp/dt space leads to virtually ignoring the
essential pieces (differentiation at the input) and then simply
dismissing visual extinction as just the way that the system works.
Turing indirectly pointed out that there are a limitless number of ways
of building ANY system with a given functionality, so why should any
(sane) person think that you can see how any hidden system works by
observing its operation? Someone is wrong here - either you or Turing.
The whole idea of understanding any black box by observing its external
functionality is a fool's errand UNLESS you have some really major clues
(like a window on its operation). Unfortunately, we just aren't there
yet with the brain.
1) You are not talking about complex systems.
2) Even if you are talking about Black Box systems, we do have some
insight into how the human mind works, because we ARE human minds, and
as human minds we do psychology (specifically cognitive science). We
have a ton of information about what goes on in there. You put false
words into people's mouths if you imply that anyone suggests emulating a
black box about which we have zero information.
My theory is that there is a threshold of mathematical understanding,
from which the remainder can be inferred. The sad states of NN and AGI
shows me that we are NOT yet there. Is dp/dt representation and a few
other things sufficient? Only time will tell.
BTW, it is really d(ln p)/dt, but that is another story. Let's first get
past dp/dt.
We don't need to: I know the difference between d(ln p)/dt and dp/dt,
and why you need to go there. You need the logarithm because (a) the
rates have to be bounded if they are going to be encoded by real
signals, and (b) you want to tie it in to information theory.
That will boot you nothing. The story does not change if you substitute
logs for the originals.
Richard Loosemore
P.S. What is it with this phrase "Theory of Everything"? The standard
implication of that choice of words is that you believe you have solved
the entire problem of cognition, neuroscience and AGI with a single
theory: is that really what you are trying to imply?
If it is what you are trying to say, then, to borrow one of Vladimir
Nesov's most evocative phrases, This Looks Bad.
-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com