Richard, On 12/25/08, Richard Loosemore <[email protected]> wrote:
> Steve Richfield wrote: > >> Ben, et al, >> After ~5 months of delay for theoretical work, here are the basic ideas >> as to how really fast and efficient automatic learning could be made almost >> trivial. I decided NOT to post the paper (yet), but rather, to just discuss >> the some of the underlying ideas in AGI-friendly terms. >> Suppose for a moment that a NN or AGI program (they can be easily mapped >> from one form to the other >> > > ... this is not obvious, to say the least. Mapping involves many > compromises that change the functioning of each type ... There are doubtless exceptions to my broad statement, but generally, neuron functionality is WIDE open to be pretty much ANYTHING you choose, including that of an AGI engine's functionality on its equations. In the reverse, any NN could be expressed in a shorthand form that contains structure, synapse functions, etc., and an AGI engine could be built/modified to function according to that shorthand. In short, mapping between NN and AGI forms presumes flexibility in the functionality of the target form. Where that flexibility is NOT present, e.g. because of orthogonal structure, etc., then you must ask whether something is being gained or lost by the difference. Clearly, any transition that involves a loss should be carefully examined to see if the entire effort is headed in the wrong direction, which I think was your original point here. > ), instead of operating on "objects" (in an > >> object-oriented sense) >> > > Neither NN nor AGI has any intrinsic relationship to OO. Clearly I need a better term here. Both NNs and AGIs tend to have neurons or equations that reflect the presence (or absence) of various objects, conditions, actions, etc. My fundamental assertion is that if you differentiate the inputs so that everything in the entire network reflects dp/dt instead of straight probabilities, then the network works identically, but learning is GREATLY simplified. > , instead, operates on the rate-of-changes in the > >> probabilities of "objects", or dp/dt. Presuming sufficient bandwidth to >> generally avoid superstitious coincidences, fast unsupervised learning then >> becomes completely trivial, as like objects cause simultaneous >> like-patterned changes in the inputs WITHOUT the overlapping effects of the >> many other objects typically present in the input (with numerous minor >> exceptions). >> > > You have already presumed that something supplies the system with "objects" > that are meaningful. Even before your first mention of dp/dt, there has to > be a mechanism that is so good that it never invents objects such as: > > Object A: "A person who once watched all of Tuesday Welds movies in the > space of one week" or > > Object B: "Something that is a combination of Julius Caesar's pinky toe > and a sour grape that Brutus' just spat out" or > > Object C: "All of the molecules involved in a swiming gala that happen to > be 17.36 meters from the last drop of water that splashed from the pool". > > You have supplied no mechanism that is able to do that, but that mechanism > is 90% of the trouble, if learning is what you are about. With prior unsupervised learning you are 100% correct. However none of the examples you gave involved temporal simultaneity. I will discuss B above because it is close enough to be interesting. If indeed someone just began to notice something interesting about Caesar's pinkie toe *as* they just began to notice the taste of a sour grape, then yes, that probably would be leaned via the mechanisms I am talking about. However, if one was "present perfect tense" while the other was just beginning, then it wouldn't with my approach but would with prior unsupervised learning methods. For example, Caesar's pinkie toe had been noticed and examined, then before the condition passed they tasted a sour grape, then temporal simultaneity of the dp/dt edges wouldn't exist to learn from. Of course, in both cases, the transforms would work identically given identical prior learning/programming. > Instead, you waved your hands and said "fast unsupervised learning > > then becomes completely trivial" .... this statement is a declaration > that a good mechanism is available. > > You then also talk about "like" objects. But the whole concept of "like" > is extraordinarily troublesome. Are Julius Caesar and Brutus "like" each > other? Seen from our distance, maybe yes, but from the point of view of > Julius C., probably not so much. Is a G-type star "like" a mirror? I don't > know any stellar astrophysicists who would say so, but then again OF COURSE > they are, because they are almost indistinguishable, because if you hold a > mirror up in the right way it can reflect the sun and the two visual images > can be identical. > > These questions can be resolved, sure enough, but it is the whole business > of resolving these questions (rather than waving a hand over them and > declaring them to be trivial) that is the point. I think that pretty much everyone everyone who has "dented their pick" on unsupervised learning (this includes myself. Does anyone else here have these same scars?) has developed methods that would work on "completely obvious" test cases but failed miserably on real-world input. My point here is that looking at things from a dp/dt point of view, real-world situations become about as simple as "completely obvious" test cases. I would quote some good source to make this point, but I don't think anyone has gone here yet. To continue this effort (as I plan to do) requires optimally solving the PCA problem, though I do NOT think that this is necessary to build good and useful NN/AGI systems. I suspect another "trap" in the concept of PCA. consider the following from my unposted paper: * principal component analysis* *:* A mathematical procedure that transforms a number of variables into a smaller number of less correlated variables called *principal components*. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. The pure mathematical form of this produces a minimal number of uncorrelated variables without regard to real-world significance, while a more useful form produces output variables that have real-world correspondence. Apparently, real-world PCAs typically combine features in a Huffman-like coding, that can be easily split back apart with simple combinatorial Bayesian logic. This could most simply be implemented as an OR of the ANDs of the features needed for each of the components, which in d(ln p)/dt space appear to be exactly what dendritic trees accomplish. Is this an unavoidable step? Is this a desirable step? Can the features be directly identified in an un-combined way? The answer to all of these questions may be YES, for if a leg of a dendritic tree extracts a feature, then it is obviously possible (and perhaps even necessary) to extract features separately from one another. Arbitrarily combining them to produce principal components may lose nothing, because downstream neurons can always separate features from components as needed. My present challenge is being a "mouse in a maze" of matrix notation and trying to see the forest for the trees. Clearly, my predecessors also were challenged in this way, so I am trying to go around the really hard problems (that evaded the best mathematicians for a century) to see what is REALLY needed, and abandon before starting work on all other areas. > But, what would Bayesian equations or NN neuron functionality look like in >> dp/dt space? NO DIFFERENCE (math upon request). You could trivially >> differentiate the inputs to a vast and complex existing AGI or NN, integrate >> the outputs, and it would perform _identically_ (except for some "little" >> details discussed below). Of course, while the transforms would be >> identical, unsupervised learning would be quite a different matter, as now >> the nearly-impossible becomes trivially simple. >> For some things (like short-term memory) you NEED an integrated >> object-oriented result. Very simple - just integrate the signal. How about >> muscle movements? Note that muscle actuation typically causes acceleration, >> which doubly integrates the driving signal, which would require yet another >> differentiation of a differentiated signal to, when doubly integrated by the >> mechanical system, produce movement to the desired location. >> Note that once input values are stored in a matrix for processing, the >> baby has already been thrown out with the bathwater. You must START with >> differentiated input values and NOT static measured values. THIS is what the >> PCA folks have been missing in their century-long quest for an efficient >> algorithm to identify principal components, as their arrays had already >> discarded exactly what they needed. Of course you could simply subtract >> successive samples from one another - at some considerable risk, since you >> are now sampling at only half the Nyquist-required speed to make your AGI/NN >> run at its intended speed. In short, if inputs are not being electronically >> differentiated, then sampling must proceed at least twice as fast as the >> NN/AGI cycles. >> But - how about the countless lost constants of integration? They "all >> come out in the wash" - except for where actual integration at the outputs >> is needed. Then, clippers and leaky integrators, techniques common to >> electrical engineering, will work fine and produce many of the same >> artifacts (like visual extinction) seen in natural systems. >> It all sounds SO simple, but I couldn't find any prior work in this >> direction using Google. However, the collective memory of this group is >> pretty good, so perhaps someone here knows of some prior effort that did >> something like this. I would sure like to put SOMETHING in the "References" >> section of my paper. >> Loosemore: THIS is what I was talking about when I explained that there >> is absolutely NO WAY to understand a complex system through direct >> observation, except by its useless anomalies. By shifting an entire AGI or >> NN to operate on derivatives instead of object values, it works *almost* >> (the operative word in this statement) exactly the same as one working in >> object-oriented space, only learning is transformed from the >> nearly-impossible to the trivially simple. Do YOU see any observation-based >> way to tell how we are operating behind our eyeballs, object-oriented or >> dp/dt? While there are certainly other explanations for visual extinction, >> this is the only one that I know of that is absolutely impossible to >> engineer around. No one has (yet) proposed any value to visual extinction, >> and it is a real problem for hunters, so if it were avoidable, then I >> suspect that ~200 million years of evolution would have eliminated it long >> ago. >> > > Read David Marr's book "Vision", THANKS for the reference. or any other text that discusses the low level work done by the visual > system. There are indeed differentiation functions in there (IIRC, Marr > came up with the Difference of Gaussians (DOG) idea because the difference > of Gaussians was a way to do the equivalent of dp/dt). BUT... this is all > in the first few wires coming out of the retina! YES - exactly where it would be needed to make the ENTIRE system work in dp/dt space. The following comments reflects a poor choice of paradigm. In short, your comments are not so much incorrect as failing to lead to a useful conclusion. > It is not interesting. Whadaya mean not interesting - then converts a significant part of the brain to operate in dp/dt space. That it accomplishes this SO simply is VERY interesting. The BIG issue is that the lab guys are no sort of mathematicians. They don't understand how simple functions at the exteriors of a computational process can completely change the internal representation throughout the process, with vast ramifications INCLUDING the need to completely rethink what neuronal activity really means. > Visual extinction (of the sort you are talking about) is all over and done > with in the first few cells of the visual pathway, whereas you are talking > here about the millions of other processes that occur higher up. All such conclusions are wrong UNLESS they allow for dp/dt operation, which they clearly haven't. You would have to integrate a neuron's output and see extinction in the integral to make such a conclusion. My point, made in a previously unposted part of my article that addresses NOT periodically restoring signals to "object" form is... "The most perfect "correction" is avoiding the problem that requires a correction. With no integration, there is no need for any mechanism to estimate the constant of integration. However, the information is still missing in the rate-of-change input, so the process would necessary introduce whatever artifacts exist in a "perfect" correction as outlined above. In short, QED, there will be extinguishment, instant recovery, and any other artifacts of methods that perfect engineers might discover in coming centuries. There is no need to show exactly how this happens, because we know that it absolutely must happen because obviated processes are by their nature as perfect as perfect can be. Hence, for now this remains an interesting but needless exercise for some future mathematician." > As for your comment about complex systems, it looks like a nonsequiteur. > Just does not follow, as far as I can see. Just take our present discussion, where failing to see that things may be operating entirely in dp/dt space leads to virtually ignoring the essential pieces (differentiation at the input) and then simply dismissing visual extinction as just the way that the system works. Turing indirectly pointed out that there are a limitless number of ways of building ANY system with a given functionality, so why should any (sane) person think that you can see how any hidden system works by observing its operation? Someone is wrong here - either you or Turing. The whole idea of understanding any black box by observing its external functionality is a fool's errand UNLESS you have some really major clues (like a window on its operation). Unfortunately, we just aren't there yet with the brain. My theory is that there is a threshold of mathematical understanding, from which the remainder can be inferred. The sad states of NN and AGI shows me that we are NOT yet there. Is dp/dt representation and a few other things sufficient? Only time will tell. BTW, it is really d(ln p)/dt, but that is another story. Let's first get past dp/dt. Steve Richfield ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
