Steve Richfield wrote:
Richard,

On 12/25/08, *Richard Loosemore* <r...@lightlink.com <mailto:r...@lightlink.com>> wrote:

    Steve Richfield wrote:

        Ben, et al,
         After ~5 months of delay for theoretical work, here are the
        basic ideas as to how really fast and efficient automatic
        learning could be made almost trivial. I decided NOT to post the
        paper (yet), but rather, to just discuss the some of the
        underlying ideas in AGI-friendly terms.
         Suppose for a moment that a NN or AGI program (they can be
        easily mapped from one form to the other


    ... this is not obvious, to say the least.  Mapping involves many
    compromises that change the functioning of each type ...

There are doubtless exceptions to my broad statement, but generally, neuron functionality is WIDE open to be pretty much ANYTHING you choose, including that of an AGI engine's functionality on its equations. In the reverse, any NN could be expressed in a shorthand form that contains structure, synapse functions, etc., and an AGI engine could be built/modified to function according to that shorthand. In short, mapping between NN and AGI forms presumes flexibility in the functionality of the target form. Where that flexibility is NOT present, e.g. because of orthogonal structure, etc., then you must ask whether something is being gained or lost by the difference. Clearly, any transition that involves a loss should be carefully examined to see if the entire effort is headed in the wrong direction, which I think was your original point here.


There is a problem here.

When someone says "X and Y can easily be mapped from one form to the other" there is an implication that they are NOt suggesting that we go right down to the basic constituents of both X and Y in order to effect the mapping.

Thus: "Chalk and Cheese can easily be mapped from one to the other" .... trivially true if we are prepared to go down to the common denominator of electrons, protons and neutrons. But if we stay at a sensible level then, no, these do not map onto one another.

Similarly, if you claim that NN and regular AGI map onto one another, I assume that you are saying something more substantial than that these two can both be broken down into their primitive computational parts, and that when this is done they seem equivalent.

NN and regular AGI, they way they are understood by people who understand them, have very different styles of constructing intelligent systems. Sure, you can code both in C, or Lisp, or Cobol, but that is to trash the real meaning of "are easily mapped onto one another".



    ), instead of operating on "objects" (in an

        object-oriented sense)


    Neither NN nor AGI has any intrinsic relationship to OO.

Clearly I need a better term here. Both NNs and AGIs tend to have neurons or equations that reflect the presence (or absence) of various objects, conditions, actions, etc. My fundamental assertion is that if you differentiate the inputs so that everything in the entire network reflects dp/dt instead of straight probabilities, then the network works identically, but learning is GREATLY simplified.

Seems like a simple misunderstanding: you were not aware that "object oriented" does not mean the same as saying that there are fundamental atomic constituents of a representation.




    , instead, operates on the rate-of-changes in the

        probabilities of "objects", or dp/dt. Presuming sufficient
        bandwidth to generally avoid superstitious coincidences, fast
        unsupervised learning then becomes completely trivial, as like
        objects cause simultaneous like-patterned changes in the inputs
        WITHOUT the overlapping effects of the many other objects
        typically present in the input (with numerous minor exceptions).


    You have already presumed that something supplies the system with
    "objects" that are meaningful.  Even before your first mention of
    dp/dt, there has to be a mechanism that is so good that it never
    invents objects such as:

    Object A:  "A person who once watched all of Tuesday Welds movies in
    the space of one week" or

    Object B:  "Something that is a combination of Julius Caesar's pinky
    toe and a sour grape that Brutus' just spat out" or

    Object C:  "All of the molecules involved in a swiming gala that
    happen to be 17.36 meters from the last drop of water that splashed
    from the pool".

    You have supplied no mechanism that is able to do that, but that
    mechanism is 90% of the trouble, if learning is what you are about.

With prior unsupervised learning you are 100% correct. However none of the examples you gave involved temporal simultaneity. I will discuss B above because it is close enough to be interesting. If indeed someone just began to notice something interesting about Caesar's pinkie toe *_as_* they just began to notice the taste of a sour grape, then yes, that probably would be leaned via the mechanisms I am talking about. However, if one was "present perfect tense" while the other was just beginning, then it wouldn't with my approach but would with prior unsupervised learning methods. For example, Caesar's pinkie toe had been noticed and examined, then before the condition passed they tasted a sour grape, then temporal simultaneity of the dp/dt edges wouldn't exist to learn from. Of course, in both cases, the transforms would work identically given identical prior learning/programming.


You have not understood the sense in which I made the point, I fear.

I was describing obviously useless concepts. Ones where there is no temporal simultaneity. Concepts thrown together out of completely useless components.

The question is: how to build a mechanism that does NOT fall into the trap of creating such nonsense-concepts. If you just say "assume that we have such a concept builder" you beg a million questions.

Your reply, above, took one of my examples and tried to talk about what could happen if it was not, after all, a nonsense-concept.

Alas, that is neither here nor there, because (sure enough) *everyone* agrees that temporal simultaneity is a good basic ground for trying to construct new concepts (it is the Reason Number One for creating a new concept!). But we also know that just common or garden variety Temporal Simultaneity doesn't get you very far .... that is the easiest of all mechanisms, and we need a hundred more concept-building mechanisms that are better than that before we have a real concept-generating engine.

And (here is where my point comes back into the picture) if anyone stands up and says "Hey everyone! I have discovered a hundred concept building mechanisms that I think will do the trick!", the first question that the crowd will ask is: "Do your mechanisms work together to build real, sensible concepts, or do they fill the system with bazillions of really dumb, useless concepts (like my nonsense list above)?"

Anyone who says that they know of a way to get unsupervised learning to occur is saying, implicitly, that they have those 100 concept building mechanisms ready to go (or one super mechanism as good as all of them). Hence my original point: you cannot simply imply that your system is working with bona-fide, coherent concepts unless you can show that it really does come up with concepts (or objects) that are sensible.

FWIW, I would level the same criticism against quite a few other people, so you don't stand alone here.


(Just briefly: if I move on to look at your actual reply above, I see also mention of rates of change (dp/dt), but no explanation of how rates of change of anything would help a system build a concept that is a combination (NOT an association, please!) of [Julius Caesar's pinky toe and a sour grape that Brutus' just spat out]. The rates of change seem irrelevant here).



    Instead, you waved your hands and said "fast unsupervised learning
     > then becomes completely trivial" .... this statement is a
    declaration that a good mechanism is available.

    You then also talk about "like" objects.  But the whole concept of
    "like" is extraordinarily troublesome.  Are Julius Caesar and Brutus
    "like" each other?  Seen from our distance, maybe yes, but from the
    point of view of Julius C., probably not so much.  Is a G-type star
    "like" a mirror?  I don't know any stellar astrophysicists who would
    say so, but then again OF COURSE they are, because they are almost
    indistinguishable, because if you hold a mirror up in the right way
    it can reflect the sun and the two visual images can be identical.

    These questions can be resolved, sure enough, but it is the whole
    business of resolving these questions (rather than waving a hand
    over them and declaring them to be trivial) that is the point.

I think that pretty much everyone everyone who has "dented their pick" on unsupervised learning (this includes myself. Does anyone else here have these same scars?) has developed methods that would work on "completely obvious" test cases but failed miserably on real-world input. My point here is that looking at things from a dp/dt point of view, real-world situations become about as simple as "completely obvious" test cases. I would quote some good source to make this point, but I don't think anyone has gone here yet.

But Steve, if YOU claim that "looking at things from a dp/dt point of view" does in fact yield a dramatic breakthrough that allows unsupervised learning to work on real world cases (something nobody else can do right now), then YOU are expected to be the one who has gone there, done it, and come back with evidence that your idea does in fact do that.

If you don't have a clear demonstration that this dp/dt idea does deliver the goods, why are you claiming that it does? Surely it is one or the other?







To continue this effort (as I plan to do) requires optimally solving the PCA problem, though I do NOT think that this is necessary to build good and useful NN/AGI systems. I suspect another "trap" in the concept of PCA. consider the following from my unposted paper: *

principal component analysis

* *:* A mathematical procedure that transforms a number of variables into a smaller number of less correlated variables called /principal components/. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The pure mathematical form of this produces a minimal number of uncorrelated variables without regard to real-world significance, while a more useful form produces output variables that have real-world correspondence.

Apparently, real-world PCAs typically combine features in a Huffman-like coding, that can be easily split back apart with simple combinatorial Bayesian logic. This could most simply be implemented as an OR of the ANDs of the features needed for each of the components, which in d(ln p)/dt space appear to be exactly what dendritic trees accomplish. Is this an unavoidable step? Is this a desirable step? Can the features be directly identified in an un-combined way? The answer to all of these questions may be YES, for if a leg of a dendritic tree extracts a feature, then it is obviously possible (and perhaps even necessary) to extract features separately from one another. Arbitrarily combining them to produce principal components may lose nothing, because downstream neurons can always separate features from components as needed. My present challenge is being a "mouse in a maze" of matrix notation and trying to see the forest for the trees. Clearly, my predecessors also were challenged in this way, so I am trying to go around the really hard problems (that evaded the best mathematicians for a century) to see what is REALLY needed, and abandon before starting work on all other areas.
         But, what would Bayesian equations or NN neuron functionality
        look like in dp/dt space? NO DIFFERENCE (math upon request). You
        could trivially differentiate the inputs to a vast and complex
        existing AGI or NN, integrate the outputs, and it would perform
        _identically_ (except for some "little" details discussed
        below). Of course, while the transforms would be identical,
        unsupervised learning would be quite a different matter, as now
        the nearly-impossible becomes trivially simple.
         For some things (like short-term memory) you NEED an integrated
        object-oriented result. Very simple - just integrate the signal.
        How about muscle movements? Note that muscle actuation typically
        causes acceleration, which doubly integrates the driving signal,
        which would require yet another differentiation of a
        differentiated signal to, when doubly integrated by the
        mechanical system, produce movement to the desired location.
         Note that once input values are stored in a matrix for
        processing, the baby has already been thrown out with the
        bathwater. You must START with differentiated input values and
        NOT static measured values. THIS is what the PCA folks have been
        missing in their century-long quest for an efficient algorithm
        to identify principal components, as their arrays had already
        discarded exactly what they needed. Of course you could simply
        subtract successive samples from one another - at some
        considerable risk, since you are now sampling at only half the
        Nyquist-required speed to make your AGI/NN run at its intended
        speed. In short, if inputs are not being electronically
        differentiated, then sampling must proceed at least twice as
        fast as the NN/AGI cycles.
         But - how about the countless lost constants of integration?
        They "all come out in the wash" - except for where actual
        integration at the outputs is needed. Then, clippers and leaky
        integrators, techniques common to electrical engineering, will
        work fine and produce many of the same artifacts (like visual
        extinction) seen in natural systems.
         It all sounds SO simple, but I couldn't find any prior work in
        this direction using Google. However, the collective memory of
        this group is pretty good, so perhaps someone here knows of some
        prior effort that did something like this. I would sure like to
        put SOMETHING in the "References" section of my paper.
         Loosemore: THIS is what I was talking about when I explained
        that there is absolutely NO WAY to understand a complex system
        through direct observation, except by its useless anomalies. By
        shifting an entire AGI or NN to operate on derivatives instead
        of object values, it works *almost* (the operative word in this
        statement) exactly the same as one working in object-oriented
        space, only learning is transformed from the nearly-impossible
        to the trivially simple. Do YOU see any observation-based way to
        tell how we are operating behind our eyeballs, object-oriented
        or dp/dt? While there are certainly other explanations for
        visual extinction, this is the only one that I know of that is
        absolutely impossible to engineer around. No one has (yet)
        proposed any value to visual extinction, and it is a real
        problem for hunters, so if it were avoidable, then I suspect
        that ~200 million years of evolution would have eliminated it
        long ago.


    Read David Marr's book "Vision",

THANKS for the reference.

    or any other text that discusses the low level work done by the
    visual system.  There are indeed differentiation functions in there
    (IIRC, Marr came up with the Difference of Gaussians (DOG) idea
    because the difference of Gaussians was a way to do the equivalent
    of dp/dt).  BUT... this is all in the first few wires coming out of
    the retina!

YES - exactly where it would be needed to make the ENTIRE system work in dp/dt space.

Not at all. You *must* read the stuff before jumping to conclusions: the DOG functions deliver information, but the sum total of all information delivered is not just derivatives from then on. The entire system does not work in dp/dt space: we have abundant evidence of systems responding to features being present, NOT just the rates of change of features.





The following comments reflects a poor choice of paradigm. In short, your comments are not so much incorrect as failing to lead to a useful conclusion.

    It is not interesting.

Whadaya mean not interesting - then converts a significant part of the brain to operate in dp/dt space. That it accomplishes this SO simply is VERY interesting.

Ditto my above comment. If you had read Marr's book, or the raft of other cog sci books that are pertinent, you would notice that it is false to say (as you do in the above paragraph) that a significant part of the brain is using only rates of change of things.


The BIG issue is that the lab guys are no sort of mathematicians. They don't understand how simple functions at the exteriors of a computational process can completely change the internal representation throughout the process, with vast ramifications INCLUDING the need to completely rethink what neuronal activity really means.

Methinks the pot calleth the kettle black here.

Those lab guys are usually pretty good at math. I am no slouch myself. And I know (as they do) that the computation of a few derivatives on the periphery does not slew the whole system in such a way that all it does from then on is work with dp/dt (or any other differential).





    Visual extinction (of the sort you are talking about) is all over
    and done with in the first few cells of the visual pathway, whereas
    you are talking here about the millions of other processes that
    occur higher up.

All such conclusions are wrong UNLESS they allow for dp/dt operation, which they clearly haven't. You would have to integrate a neuron's output and see extinction in the integral to make such a conclusion. My point, made in a previously unposted part of my article that addresses NOT periodically restoring signals to "object" form is...

"The most perfect "correction" is avoiding the problem that requires a correction. With no integration, there is no need for any mechanism to estimate the constant of integration. However, the information is still missing in the rate-of-change input, so the process would necessary introduce whatever artifacts exist in a "perfect" correction as outlined above. In short, QED, there will be extinguishment, instant recovery, and any other artifacts of methods that perfect engineers might discover in coming centuries. There is no need to show exactly how this happens, because we know that it absolutely must happen because obviated processes are by their nature as perfect as perfect can be. Hence, for now this remains an interesting but needless exercise for some future mathematician."

Here I must defer to Vladimir's earlier conclusion: this is meaningless verbiage. Not coherent. Word salad. Sorry, but there comes a time for calling a spade a spade.





    As for your comment about complex systems, it looks like a
    nonsequiteur.  Just does not follow, as far as I can see.

Just take our present discussion, where failing to see that things may be operating entirely in dp/dt space leads to virtually ignoring the essential pieces (differentiation at the input) and then simply dismissing visual extinction as just the way that the system works. Turing indirectly pointed out that there are a limitless number of ways of building ANY system with a given functionality, so why should any (sane) person think that you can see how any hidden system works by observing its operation? Someone is wrong here - either you or Turing. The whole idea of understanding any black box by observing its external functionality is a fool's errand UNLESS you have some really major clues (like a window on its operation). Unfortunately, we just aren't there yet with the brain.

1) You are not talking about complex systems.

2) Even if you are talking about Black Box systems, we do have some insight into how the human mind works, because we ARE human minds, and as human minds we do psychology (specifically cognitive science). We have a ton of information about what goes on in there. You put false words into people's mouths if you imply that anyone suggests emulating a black box about which we have zero information.




My theory is that there is a threshold of mathematical understanding, from which the remainder can be inferred. The sad states of NN and AGI shows me that we are NOT yet there. Is dp/dt representation and a few other things sufficient? Only time will tell. BTW, it is really d(ln p)/dt, but that is another story. Let's first get past dp/dt.

We don't need to: I know the difference between d(ln p)/dt and dp/dt, and why you need to go there. You need the logarithm because (a) the rates have to be bounded if they are going to be encoded by real signals, and (b) you want to tie it in to information theory.

That will boot you nothing. The story does not change if you substitute logs for the originals.




Richard Loosemore



P.S. What is it with this phrase "Theory of Everything"? The standard implication of that choice of words is that you believe you have solved the entire problem of cognition, neuroscience and AGI with a single theory: is that really what you are trying to imply?

If it is what you are trying to say, then, to borrow one of Vladimir Nesov's most evocative phrases, This Looks Bad.







-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Reply via email to