Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Richard Loosemore Thu, 25 Dec 2008 21:20:28 -0800

Steve Richfield wrote:

Richard,
On 12/25/08, *Richard Loosemore* <r...@lightlink.com<mailto:r...@lightlink.com>> wrote:
    Steve Richfield wrote:

        Ben, et al,
         After ~5 months of delay for theoretical work, here are the
        basic ideas as to how really fast and efficient automatic
        learning could be made almost trivial. I decided NOT to post the
        paper (yet), but rather, to just discuss the some of the
        underlying ideas in AGI-friendly terms.
         Suppose for a moment that a NN or AGI program (they can be
        easily mapped from one form to the other


    ... this is not obvious, to say the least.  Mapping involves many
    compromises that change the functioning of each type ...
There are doubtless exceptions to my broad statement, but generally,neuron functionality is WIDE open to be pretty much ANYTHING you choose,including that of an AGI engine's functionality on its equations.In the reverse, any NN could be expressed in a shorthand form thatcontains structure, synapse functions, etc., and an AGI engine could bebuilt/modified to function according to that shorthand.In short, mapping between NN and AGI forms presumes flexibility in thefunctionality of the target form. Where that flexibility is NOT present,e.g. because of orthogonal structure, etc., then you must ask whethersomething is being gained or lost by the difference. Clearly, anytransition that involves a loss should be carefully examined to see ifthe entire effort is headed in the wrong direction, which I think wasyour original point here.



There is a problem here.

When someone says "X and Y can easily be mapped from one form to theother" there is an implication that they are NOt suggesting that we goright down to the basic constituents of both X and Y in order to effectthe mapping.

Thus: "Chalk and Cheese can easily be mapped from one to the other".... trivially true if we are prepared to go down to the commondenominator of electrons, protons and neutrons. But if we stay at asensible level then, no, these do not map onto one another.

Similarly, if you claim that NN and regular AGI map onto one another, Iassume that you are saying something more substantial than that thesetwo can both be broken down into their primitive computational parts,and that when this is done they seem equivalent.

NN and regular AGI, they way they are understood by people whounderstand them, have very different styles of constructing intelligentsystems. Sure, you can code both in C, or Lisp, or Cobol, but that isto trash the real meaning of "are easily mapped onto one another".

    ), instead of operating on "objects" (in an

        object-oriented sense)


    Neither NN nor AGI has any intrinsic relationship to OO.
Clearly I need a better term here. Both NNs and AGIs tend to haveneurons or equations that reflect the presence (or absence) of variousobjects, conditions, actions, etc. My fundamental assertion is that ifyou differentiate the inputs so that everything in the entire networkreflects dp/dt instead of straight probabilities, then the network worksidentically, but learning is GREATLY simplified.

Seems like a simple misunderstanding: you were not aware that "objectoriented" does not mean the same as saying that there are fundamentalatomic constituents of a representation.

    , instead, operates on the rate-of-changes in the

        probabilities of "objects", or dp/dt. Presuming sufficient
        bandwidth to generally avoid superstitious coincidences, fast
        unsupervised learning then becomes completely trivial, as like
        objects cause simultaneous like-patterned changes in the inputs
        WITHOUT the overlapping effects of the many other objects
        typically present in the input (with numerous minor exceptions).


    You have already presumed that something supplies the system with
    "objects" that are meaningful.  Even before your first mention of
    dp/dt, there has to be a mechanism that is so good that it never
    invents objects such as:

    Object A:  "A person who once watched all of Tuesday Welds movies in
    the space of one week" or

    Object B:  "Something that is a combination of Julius Caesar's pinky
    toe and a sour grape that Brutus' just spat out" or

    Object C:  "All of the molecules involved in a swiming gala that
    happen to be 17.36 meters from the last drop of water that splashed
    from the pool".

    You have supplied no mechanism that is able to do that, but that
    mechanism is 90% of the trouble, if learning is what you are about.
With prior unsupervised learning you are 100% correct. However none ofthe examples you gave involved temporal simultaneity. I will discuss Babove because it is close enough to be interesting.If indeed someone just began to notice something interesting aboutCaesar's pinkie toe *_as_* they just began to notice the taste of a sourgrape, then yes, that probably would be leaned via the mechanisms I amtalking about. However, if one was "present perfect tense" while theother was just beginning, then it wouldn't with my approach but wouldwith prior unsupervised learning methods. For example, Caesar's pinkietoe had been noticed and examined, then before the condition passed theytasted a sour grape, then temporal simultaneity of the dp/dt edgeswouldn't exist to learn from. Of course, in both cases, the transformswould work identically given identical prior learning/programming.



You have not understood the sense in which I made the point, I fear.

I was describing obviously useless concepts. Ones where there is notemporal simultaneity. Concepts thrown together out of completelyuseless components.

The question is: how to build a mechanism that does NOT fall into thetrap of creating such nonsense-concepts. If you just say "assume thatwe have such a concept builder" you beg a million questions.

Your reply, above, took one of my examples and tried to talk about whatcould happen if it was not, after all, a nonsense-concept.

Alas, that is neither here nor there, because (sure enough) *everyone*agrees that temporal simultaneity is a good basic ground for trying toconstruct new concepts (it is the Reason Number One for creating a newconcept!). But we also know that just common or garden variety TemporalSimultaneity doesn't get you very far .... that is the easiest of allmechanisms, and we need a hundred more concept-building mechanisms thatare better than that before we have a real concept-generating engine.

And (here is where my point comes back into the picture) if anyonestands up and says "Hey everyone! I have discovered a hundred conceptbuilding mechanisms that I think will do the trick!", the first questionthat the crowd will ask is: "Do your mechanisms work together to buildreal, sensible concepts, or do they fill the system with bazillions ofreally dumb, useless concepts (like my nonsense list above)?"

Anyone who says that they know of a way to get unsupervised learning tooccur is saying, implicitly, that they have those 100 concept buildingmechanisms ready to go (or one super mechanism as good as all of them).Hence my original point: you cannot simply imply that your system isworking with bona-fide, coherent concepts unless you can show that itreally does come up with concepts (or objects) that are sensible.

FWIW, I would level the same criticism against quite a few other people,so you don't stand alone here.

(Just briefly: if I move on to look at your actual reply above, I seealso mention of rates of change (dp/dt), but no explanation of how ratesof change of anything would help a system build a concept that is acombination (NOT an association, please!) of [Julius Caesar's pinky toeand a sour grape that Brutus' just spat out]. The rates of change seemirrelevant here).

    Instead, you waved your hands and said "fast unsupervised learning
     > then becomes completely trivial" .... this statement is a
    declaration that a good mechanism is available.

    You then also talk about "like" objects.  But the whole concept of
    "like" is extraordinarily troublesome.  Are Julius Caesar and Brutus
    "like" each other?  Seen from our distance, maybe yes, but from the
    point of view of Julius C., probably not so much.  Is a G-type star
    "like" a mirror?  I don't know any stellar astrophysicists who would
    say so, but then again OF COURSE they are, because they are almost
    indistinguishable, because if you hold a mirror up in the right way
    it can reflect the sun and the two visual images can be identical.

    These questions can be resolved, sure enough, but it is the whole
    business of resolving these questions (rather than waving a hand
    over them and declaring them to be trivial) that is the point.
I think that pretty much everyone everyone who has "dented their pick"on unsupervised learning (this includes myself. Does anyone else herehave these same scars?) has developed methods that would work on"completely obvious" test cases but failed miserably on real-worldinput. My point here is that looking at things from a dp/dt point ofview, real-world situations become about as simple as "completelyobvious" test cases.I would quote some good source to make this point, but I don't thinkanyone has gone here yet.

But Steve, if YOU claim that "looking at things from a dp/dt point ofview" does in fact yield a dramatic breakthrough that allowsunsupervised learning to work on real world cases (something nobody elsecan do right now), then YOU are expected to be the one who has gonethere, done it, and come back with evidence that your idea does in factdo that.

If you don't have a clear demonstration that this dp/dt idea doesdeliver the goods, why are you claiming that it does? Surely it is oneor the other?

To continue this effort (as I plan to do) requires optimally solving thePCA problem, though I do NOT think that this is necessary to build goodand useful NN/AGI systems. I suspect another "trap" in the concept ofPCA. consider the following from my unposted paper:*
principal component analysis
* *:* A mathematical procedure that transforms a number of variablesinto a smaller number of less correlated variables called /principalcomponents/. The first principal component accounts for as much of thevariability in the data as possible, and each succeeding componentaccounts for as much of the remaining variability as possible. The puremathematical form of this produces a minimal number of uncorrelatedvariables without regard to real-world significance, while a more usefulform produces output variables that have real-world correspondence.
Apparently, real-world PCAs typically combine features in a Huffman-likecoding, that can be easily split back apart with simple combinatorialBayesian logic. This could most simply be implemented as an OR of theANDs of the features needed for each of the components, which in d(lnp)/dt space appear to be exactly what dendritic trees accomplish.Is this an unavoidable step? Is this a desirable step? Can the featuresbe directly identified in an un-combined way? The answer to all of thesequestions may be YES, for if a leg of a dendritic tree extracts afeature, then it is obviously possible (and perhaps even necessary) toextract features separately from one another. Arbitrarily combining themto produce principal components may lose nothing, because downstreamneurons can always separate features from components as needed.My present challenge is being a "mouse in a maze" of matrix notation andtrying to see the forest for the trees. Clearly, my predecessors alsowere challenged in this way, so I am trying to go around the really hardproblems (that evaded the best mathematicians for a century) to see whatis REALLY needed, and abandon before starting work on all other areas.
         But, what would Bayesian equations or NN neuron functionality
        look like in dp/dt space? NO DIFFERENCE (math upon request). You
        could trivially differentiate the inputs to a vast and complex
        existing AGI or NN, integrate the outputs, and it would perform
        _identically_ (except for some "little" details discussed
        below). Of course, while the transforms would be identical,
        unsupervised learning would be quite a different matter, as now
        the nearly-impossible becomes trivially simple.
         For some things (like short-term memory) you NEED an integrated
        object-oriented result. Very simple - just integrate the signal.
        How about muscle movements? Note that muscle actuation typically
        causes acceleration, which doubly integrates the driving signal,
        which would require yet another differentiation of a
        differentiated signal to, when doubly integrated by the
        mechanical system, produce movement to the desired location.
         Note that once input values are stored in a matrix for
        processing, the baby has already been thrown out with the
        bathwater. You must START with differentiated input values and
        NOT static measured values. THIS is what the PCA folks have been
        missing in their century-long quest for an efficient algorithm
        to identify principal components, as their arrays had already
        discarded exactly what they needed. Of course you could simply
        subtract successive samples from one another - at some
        considerable risk, since you are now sampling at only half the
        Nyquist-required speed to make your AGI/NN run at its intended
        speed. In short, if inputs are not being electronically
        differentiated, then sampling must proceed at least twice as
        fast as the NN/AGI cycles.
         But - how about the countless lost constants of integration?
        They "all come out in the wash" - except for where actual
        integration at the outputs is needed. Then, clippers and leaky
        integrators, techniques common to electrical engineering, will
        work fine and produce many of the same artifacts (like visual
        extinction) seen in natural systems.
         It all sounds SO simple, but I couldn't find any prior work in
        this direction using Google. However, the collective memory of
        this group is pretty good, so perhaps someone here knows of some
        prior effort that did something like this. I would sure like to
        put SOMETHING in the "References" section of my paper.
         Loosemore: THIS is what I was talking about when I explained
        that there is absolutely NO WAY to understand a complex system
        through direct observation, except by its useless anomalies. By
        shifting an entire AGI or NN to operate on derivatives instead
        of object values, it works *almost* (the operative word in this
        statement) exactly the same as one working in object-oriented
        space, only learning is transformed from the nearly-impossible
        to the trivially simple. Do YOU see any observation-based way to
        tell how we are operating behind our eyeballs, object-oriented
        or dp/dt? While there are certainly other explanations for
        visual extinction, this is the only one that I know of that is
        absolutely impossible to engineer around. No one has (yet)
        proposed any value to visual extinction, and it is a real
        problem for hunters, so if it were avoidable, then I suspect
        that ~200 million years of evolution would have eliminated it
        long ago.


    Read David Marr's book "Vision",
THANKS for the reference.
    or any other text that discusses the low level work done by the
    visual system.  There are indeed differentiation functions in there
    (IIRC, Marr came up with the Difference of Gaussians (DOG) idea
    because the difference of Gaussians was a way to do the equivalent
    of dp/dt).  BUT... this is all in the first few wires coming out of
    the retina!
YES - exactly where it would be needed to make the ENTIRE system work indp/dt space.

Not at all. You *must* read the stuff before jumping to conclusions:the DOG functions deliver information, but the sum total of allinformation delivered is not just derivatives from then on. The entiresystem does not work in dp/dt space: we have abundant evidence ofsystems responding to features being present, NOT just the rates ofchange of features.

The following comments reflects a poor choice of paradigm. In short,your comments are not so much incorrect as failing to lead to a usefulconclusion.
    It is not interesting.
Whadaya mean not interesting - then converts a significant part of thebrain to operate in dp/dt space. That it accomplishes this SO simply isVERY interesting.

Ditto my above comment. If you had read Marr's book, or the raft ofother cog sci books that are pertinent, you would notice that it isfalse to say (as you do in the above paragraph) that a significant partof the brain is using only rates of change of things.

The BIG issue is that the lab guys are no sort of mathematicians. Theydon't understand how simple functions at the exteriors of acomputational process can completely change the internal representationthroughout the process, with vast ramifications INCLUDING the need tocompletely rethink what neuronal activity really means.


Methinks the pot calleth the kettle black here.

Those lab guys are usually pretty good at math. I am no slouch myself.And I know (as they do) that the computation of a few derivatives onthe periphery does not slew the whole system in such a way that all itdoes from then on is work with dp/dt (or any other differential).

    Visual extinction (of the sort you are talking about) is all over
    and done with in the first few cells of the visual pathway, whereas
    you are talking here about the millions of other processes that
    occur higher up.
All such conclusions are wrong UNLESS they allow for dp/dt operation,which they clearly haven't. You would have to integrate a neuron'soutput and see extinction in the integral to make such a conclusion. Mypoint, made in a previously unposted part of my article that addressesNOT periodically restoring signals to "object" form is...
"The most perfect "correction" is avoiding the problem that requires acorrection. With no integration, there is no need for any mechanism toestimate the constant of integration. However, the information is stillmissing in the rate-of-change input, so the process would necessaryintroduce whatever artifacts exist in a "perfect" correction as outlinedabove. In short, QED, there will be extinguishment, instant recovery,and any other artifacts of methods that perfect engineers might discoverin coming centuries. There is no need to show exactly how this happens,because we know that it absolutely must happen because obviatedprocesses are by their nature as perfect as perfect can be. Hence, fornow this remains an interesting but needless exercise for some futuremathematician."

Here I must defer to Vladimir's earlier conclusion: this is meaninglessverbiage. Not coherent. Word salad. Sorry, but there comes a time forcalling a spade a spade.

    As for your comment about complex systems, it looks like a
    nonsequiteur.  Just does not follow, as far as I can see.
Just take our present discussion, where failing to see that things maybe operating entirely in dp/dt space leads to virtually ignoring theessential pieces (differentiation at the input) and then simplydismissing visual extinction as just the way that the system works.Turing indirectly pointed out that there are a limitless number of waysof building ANY system with a given functionality, so why should any(sane) person think that you can see how any hidden system works byobserving its operation? Someone is wrong here - either you or Turing.The whole idea of understanding any black box by observing its externalfunctionality is a fool's errand UNLESS you have some really major clues(like a window on its operation). Unfortunately, we just aren't thereyet with the brain.


1) You are not talking about complex systems.

2) Even if you are talking about Black Box systems, we do have someinsight into how the human mind works, because we ARE human minds, andas human minds we do psychology (specifically cognitive science). Wehave a ton of information about what goes on in there. You put falsewords into people's mouths if you imply that anyone suggests emulating ablack box about which we have zero information.

My theory is that there is a threshold of mathematical understanding,from which the remainder can be inferred. The sad states of NN and AGIshows me that we are NOT yet there. Is dp/dt representation and a fewother things sufficient? Only time will tell.BTW, it is really d(ln p)/dt, but that is another story. Let's first getpast dp/dt.

We don't need to: I know the difference between d(ln p)/dt and dp/dt,and why you need to go there. You need the logarithm because (a) therates have to be bounded if they are going to be encoded by realsignals, and (b) you want to tie it in to information theory.

That will boot you nothing. The story does not change if you substitutelogs for the originals.





Richard Loosemore

P.S. What is it with this phrase "Theory of Everything"? The standardimplication of that choice of words is that you believe you have solvedthe entire problem of cognition, neuroscience and AGI with a singletheory: is that really what you are trying to imply?

If it is what you are trying to say, then, to borrow one of VladimirNesov's most evocative phrases, This Looks Bad.








-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to