Steve, Richard is right when he says temporal simultaneity is not a sufficient principle. Suppose you present your system with the following sequences (letters could be substituted for sounds, colors, objects, whatever):
ABCABCABCABC... AAABBBAAABBB... ABBAAABBBBAAAAABBBBBB... ABBCCCDDDDEEEEEFFFFFF... ABACABADABACABAEABACABADABACABA... All of these sequences have "concepts" behind them. All of these concepts are immune to temporal-simultaneity-learning (although the first could be learned by temporal adjacency, and the second by temporal adjacency with a delay of 3). The transition to sequence learning is (at least, in my eyes) a transition to relational learning, as opposed to the "flat" learning that PCA is designed for. In other words, completely new methods are required. You already begin that transition by invoking dp/dt, which assumes a temporal aspect to the data... See this blog post for a more full account of my view on the current state of affairs. (It started out as a post about a new algorithm I'd been thinking about, but turned into an essay on the difference between relational methods and "flat" (propositional) methods, and how to bridge the gap. If you're wondering about the title, see the previous post.) http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html --Abram On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield <[email protected]> wrote: > Richard,Richard, > > On 12/25/08, Richard Loosemore <[email protected]> wrote: >> >> Steve Richfield wrote: >>> >>> There are doubtless exceptions to my broad statement, but generally, >>> neuron functionality is WIDE open to be pretty much ANYTHING you choose, >>> including that of an AGI engine's functionality on its equations. >>> In the reverse, any NN could be expressed in a shorthand form that >>> contains structure, synapse functions, etc., and an AGI engine could be >>> built/modified to function according to that shorthand. >>> In short, mapping between NN and AGI forms presumes flexibility in the >>> functionality of the target form. Where that flexibility is NOT present, >>> e.g. because of orthogonal structure, etc., then you must ask whether >>> something is being gained or lost by the difference. Clearly, any transition >>> that involves a loss should be carefully examined to see if the entire >>> effort is headed in the wrong direction, which I think was your original >>> point here. >> >> >> There is a problem here. >> >> When someone says "X and Y can easily be mapped from one form to the >> other" there is an implication that they are NOt suggesting that we go right >> down to the basic constituents of both X and Y in order to effect the >> mapping. >> >> Thus: "Chalk and Cheese can easily be mapped from one to the other" .... >> trivially true if we are prepared to go down to the common denominator of >> electrons, protons and neutrons. But if we stay at a sensible level then, >> no, these do not map onto one another. > > > The problem here is that you were thinking present existing NN and AGI > systems, neither of which work (yet) in any really useful way, that it was > obviously impossible to directly convert from one system with its set of bad > assumptions to another system with a completely different set of bad > assumptions. I completely agree, but I assert that the answer to that > particular question is of no practical interest to anyone. > > On the other hand, converting between NN and AGI systems built on the SAME > set of assumptions would be simple. This situation doesn't yet exist. Until > then, converting a program from one dysfunctional platform to another is > uninteresting. When the assumptions get ironed out, then all systems will be > built on the same assumptions, and there will be few problems going between > them, EXCEPT: > > Things need to be arranged in arrays for automated learning, which much more > fits the present NN paradigm than the present AGI paradigm. >> >> Similarly, if you claim that NN and regular AGI map onto one another, I >> assume that you are saying something more substantial than that these two >> can both be broken down into their primitive computational parts, and that >> when this is done they seem equivalent. > > > Even this breakdown isn't required if both systems are built on the same > correct assumptions. HOWEVER, I see no way to transfer fast learning from an > NN-like construction to an AGI-like construction. Do you? If there is no > answer to this question, then this unanswerable question would seem to > redirect AGI efforts to NN-like constructions if they are ever to learn like > we do. >> >> NN and regular AGI, they way they are understood by people who understand >> them, have very different styles of constructing intelligent systems. > > > Neither of which work (yet). Of course, we are both trying to fill in the > gaps. >> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to trash the >> real meaning of "are easily mapped onto one another". > > > One of my favorite consulting projects involved coding an AI program to > solve complex problems that were roughly equivalent to solving algebraic > equations. This composed the Yellow pages for 28 different large phone > directories. The project was for a major phone company and had to be written > entirely in COBOL. Further, it had to run at n log n speed and NOT n^2 > speed, which I did by using successive sorts instead of list processing > methods. It would have been rather difficult to achieve the needed > performance in C or Lisp, even though COBOL would seem to be everyone's > first choice as the last choice on the list of prospective platforms. >>> >>> ), instead of operating on "objects" (in an >>> >>> object-oriented sense) >>> >>> >>> Neither NN nor AGI has any intrinsic relationship to OO. >>> >>> Clearly I need a better term here. Both NNs and AGIs tend to have >>> neurons or equations that reflect the presence (or absence) of various >>> objects, conditions, actions, etc. My fundamental assertion is that if you >>> differentiate the inputs so that everything in the entire network reflects >>> dp/dt instead of straight probabilities, then the network works identically, >>> but learning is GREATLY simplified. >> >> Seems like a simple misunderstanding: you were not aware that "object >> oriented" does not mean the same as saying that there are fundamental atomic >> constituents of a representation. > > > A typical semantic overloading problem. "Atomic consitituent orientation" > doesn't really work either, because in later stages, individual > terms/neurons can represent entire concepts, strategies, etc. I am still > looking for a good term here. >>> >>> >>> , instead, operates on the rate-of-changes in the >>> >>> probabilities of "objects", or dp/dt. Presuming sufficient >>> bandwidth to generally avoid superstitious coincidences, fast >>> unsupervised learning then becomes completely trivial, as like >>> objects cause simultaneous like-patterned changes in the inputs >>> WITHOUT the overlapping effects of the many other objects >>> typically present in the input (with numerous minor exceptions). >>> >>> >>> You have already presumed that something supplies the system with >>> "objects" that are meaningful. Even before your first mention of >>> dp/dt, there has to be a mechanism that is so good that it never >>> invents objects such as: >>> >>> Object A: "A person who once watched all of Tuesday Welds movies in >>> the space of one week" or >>> >>> Object B: "Something that is a combination of Julius Caesar's pinky >>> toe and a sour grape that Brutus' just spat out" or >>> >>> Object C: "All of the molecules involved in a swiming gala that >>> happen to be 17.36 meters from the last drop of water that splashed >>> from the pool". >>> >>> You have supplied no mechanism that is able to do that, but that >>> mechanism is 90% of the trouble, if learning is what you are about. >>> >>> With prior unsupervised learning you are 100% correct. However none of >>> the examples you gave involved temporal simultaneity. I will discuss B above >>> because it is close enough to be interesting. >>> If indeed someone just began to notice something interesting about >>> Caesar's pinkie toe *_as_* they just began to notice the taste of a sour >>> grape, then yes, that probably would be leaned via the mechanisms I am >>> talking about. However, if one was "present perfect tense" while the other >>> was just beginning, then it wouldn't with my approach but would with prior >>> unsupervised learning methods. For example, Caesar's pinkie toe had been >>> noticed and examined, then before the condition passed they tasted a sour >>> grape, then temporal simultaneity of the dp/dt edges wouldn't exist to learn >>> from. Of course, in both cases, the transforms would work identically given >>> identical prior learning/programming. >> >> >> You have not understood the sense in which I made the point, I fear. > > > I think the reverse is true. Consider... >> >> I was describing obviously useless concepts. Ones where there is no >> temporal simultaneity. > > > dp/dt is unable to even notice things that lack temporal simultaneity, so > the examples you gave, though typical challenges to past unsupervised > learning, and complete non-issues in dp/dt space. >> >> Concepts thrown together out of completely useless components. > > > ... that require SOME force/reason/bug/error/etc to get thrown together. I > think we both understand how this was a typical challenge to past > unsupervised learning efforts. I am asserting that in dp/dt systems, there > is NO force/reason/error/etc to ever throw such things together, and hence, > no reason for vastly complex matrix transforms to then try to pull them back > apart. >> >> The question is: how to build a mechanism that does NOT fall into the >> trap of creating such nonsense-concepts. If you just say "assume that we >> have such a concept builder" you beg a million questions. > > >> Your reply, above, took one of my examples and tried to talk about what >> could happen if it was not, after all, a nonsense-concept. > > Note that GMAIL got sick here, so I'll mark your text with >. Also, some > replies are deeply indented, so I have bolded some of them. > > I was just "playing neuron" without any mindreading abilities. >>Alas, that is neither here nor there, because (sure enough) *everyone* >> agrees that temporal simultaneity is a good basic ground for trying to >> construct new concepts (it is the Reason Number One for creating a new >> concept!). But we also know that just common or garden variety Temporal >> Simultaneity doesn't get you very far .... that is the easiest of all >> mechanisms, and we need a hundred more concept-building mechanisms that are >> better than that before we have a real concept-generating engine. > > Now, we can start "picking through" the approaches. I suspect that looking > for the principal components of temporally simultaneous inputs goes a LONG > was toward what is sought, but have no proof (yet). Do YOU have some idea as > to where the threshold of usefullness is? > >> And (here is where my point comes back into the picture) if anyone stands >> up and says "Hey everyone! I have discovered a hundred concept building >> mechanisms that I think will do the trick!", the first question that the >> crowd will ask is: "Do your mechanisms work together to build real, >> sensible concepts, or do they fill the system with bazillions of really >> dumb, useless concepts (like my nonsense list above)?" > > Clearly, PCA on simultaneous inputs will NOT do that, because they must show > common things in order not to end up at the wrong end of the Huffman code. > >> Anyone who says that they know of a way to get unsupervised learning to >> occur is saying, implicitly, that they have those 100 concept building >> mechanisms ready to go (or one super mechanism as good as all of them). >> Hence my original point: you cannot simply imply that your system is >> working with bona-fide, coherent concepts unless you can show that it really >> does come up with concepts (or objects) that are sensible. > > Perhaps you could exhibit some examples where learning based on temporal > simultaneity with a preference for identifying common patterns (as PCA > requires) fails. Clearly, if I think that a relatively simple approach (like > PCA on dp/dt inputs) should work, but you are convinced that it will fall > into an abyss of superstitious learning, then you will have a MUCH easier > time exhibiting a couple of example failures than I will have somehow > proving that it always works (which is probably beyond the mathematical > state of the art). > > I'm not saying your are wrong here, only that you may not have heard me > (probably my fault for not saying things clearly enough), and you haven't > made your point by exhibiting something on which my approach would fail. > >> FWIW, I would level the same criticism against quite a few other people, >> so you don't stand alone here. > > My ego is quite indestructable and I understand that your body temperature > is low, so you have nothing to worry about here. >> (Just briefly: if I move on to look at your actual reply above, I see >> also mention of rates of change (dp/dt), but no explanation of how rates of >> change of anything would help a system build a concept that is a combination >> (NOT an association, please!) of [Julius Caesar's pinky toe and a sour grape >> that Brutus' just spat out]. The rates of change seem irrelevant here). > > If you take a neuron or Bayesian formula programmed to do something static > and throw dp/dt inputs at it, its output will be the dp/dt of the result > from static operation. You could then simply integrate it to produce exactly > the same output. Hence, the ONLY reason to operate in dp/dt space is for the > learning, as the transformation itself is unaffected. > > Now, if you look for an association in dp/dt space and decide to recognize > it, that same neuron with then operate to recognize a combination, once its > output has been integrated. Of course, not integrating but simply using its > output by subsequent neurons, the entire system will operate as though it > recognized the combination, even though, if you attached an oscilloscope to > the output, you would see positive and negative spikes around what would be > a steady-state output in "object" mode. > > In short, it programs based on associations, but functions based on > represented combinations, that representation being the dp/dt of the > combination. > >>> >>> >>> Instead, you waved your hands and said "fast unsupervised learning >>> > then becomes completely trivial" .... this statement is a >>> declaration that a good mechanism is available. >>> >>> You then also talk about "like" objects. But the whole concept of >>> "like" is extraordinarily troublesome. Are Julius Caesar and Brutus >>> "like" each other? Seen from our distance, maybe yes, but from the >>> point of view of Julius C., probably not so much. Is a G-type star >>> "like" a mirror? I don't know any stellar astrophysicists who would >>> say so, but then again OF COURSE they are, because they are almost >>> indistinguishable, because if you hold a mirror up in the right way >>> it can reflect the sun and the two visual images can be identical. >>> >>> These questions can be resolved, sure enough, but it is the whole >>> business of resolving these questions (rather than waving a hand >>> over them and declaring them to be trivial) that is the point. >>> >>> I think that pretty much everyone everyone who has "dented their pick" >>> on unsupervised learning (this includes myself. Does anyone else here have >>> these same scars?) has developed methods that would work on "completely >>> obvious" test cases but failed miserably on real-world input. My point here >>> is that looking at things from a dp/dt point of view, real-world situations >>> become about as simple as "completely obvious" test cases. >>> I would quote some good source to make this point, but I don't think >>> anyone has gone here yet. >> >>> >>> >>> If you don't have a clear demonstration that this dp/dt idea does deliver >>> the goods, why are you claiming that it does? Surely it is one or the >>> other? >>> >>> This month I am wearing my mathematician hat. My son Eddie is the NN >>> hacker of the family, and he is waiting impatiently for me to declare a >>> tentative completion so he can run with it. >>> >>> For now, my goal is to come up with sufficiently good theory that even >>> you can't poke any significant holes in it. Once I become the first person >>> in history to ever receive the the Loosemore Seal of No Objection, I will >>> probably wrap this thing up and turn it over to Eddie. >>>> >>>> But Steve, if YOU claim that "looking at things from a dp/dt point of >>>> view" does in fact yield a dramatic breakthrough that allows unsupervised >>>> learning to work on real world cases (something nobody else can do right >>>> now), >>>> >>>> Not entirely true, as PCA does what could be considered to be >>>> unsupervised learning, though granted, it is WAY too inefficient for NN/AGI >>>> use without dp/dt. >>>>> >>>>> then YOU are expected to be the one who has gone there, done it, and >>>>> come back with evidence that your idea does in fact do that. >> >> First comes the theory, then comes the demo. Neither contains any sort of >> proof, but it is a LOT cheaper to shoot something down BEFORE it is built >> than after. Hence, I find this exercise VERY valuable. THANKS. Please keep >> up the good work. >> >> Steve Richield > > ________________________________ > agi | Archives | Modify Your Subscription -- Abram Demski Public address: [email protected] Public archive: http://groups.google.com/group/abram-demski Private address: [email protected] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
