Steve, > There has been plenty of speculation regarding just WHAT is buried in those > principal components. > Do they generally comprise simple combinations if identifiable features, or > some sort of smushing > that virtually encrypts the features? I have heard arguments on both sides > of this issue. > Can anyone here shine some light on this?
It seems like this gets back to the ill-defined problem again. There is no way of answering without more information about what the output of PCA is to be used for! The only immediate criteria we have is how good of a probabilistic model an algorithm finds. > There are, of course, many models to explain any finite set of data. > Some like PCA may do so more concisely, while others may do so in ways > that better lend themselves to subsequent computations, > presumably to direct future actions. Conciseness will tend to be better for computation, simply because it is computationally easier to manipulate less data... of course, this is no guarantee, since the data may need to be fully uncompressed to extract a different feature, and if the compression is lossy then the feature may no longer be available. But if we know what we want to compute, we should be using supervised learning methods. Good predictive models will be likely to help us regardless of our goal. > Unrestrained, PCA could change a neuron's functionality based on new data, > and very likely wreck > a functioning NN's future operation by doing so. Learning can still eventually converge. Also, I want to note that this is in some respects a quirk of NN methods that goes away if you think of things symbolically. --Abram On Sun, Dec 28, 2008 at 2:36 AM, Steve Richfield <[email protected]> wrote: > Abram, > > On 12/27/08, Abram Demski <[email protected]> wrote: >> >> Steve, >> >> My thinking in the "significant figures" issue is that the purpose of >> unsupervised learning is to find a probabilistic model of the data > > > There are, of course, many models to explain any finite set of data. > Some like PCA may do so more concisely, while others may do so in ways > that better lend themselves to subsequent computations, > presumably to direct future actions. > >> >> (whereas the purpose of supervised learning is to find a probabilistic >> model of *one* variable *conditioned on* all the others). When you >> talk about the insufficiency of standard PCA, do you think the >> problems you refer to relate to >> >> (1) PCA finding a suboptimal model, or > > > There has been plenty of speculation regarding just WHAT is buried in those > principal components. > Do they generally comprise simple combinations if identifiable features, or > some sort of smushing > that virtually encrypts the features? I have heard arguments on both sides > of this issue. > Can anyone here shine some light on this? > > > > If the features can be extracted from combinations of components, then PCA > is arguably optimal. > If not, then PCA is probably not what is needed. > > Genuine PCA has some other unrelated problems, in that it is VERY > computationally intensive, > and there isn't (yet) any good "incremental PCA" algorithm that learns > somewhat like you would > expect a neuron to learn. I suspect that I may also have to crack this nut > before dp/dt > becomes truly useful. > >> >> (2) the optimal model being not quite what you are after? > > > I, like everyone else, want to use an optimal model. However, my idea of > optimality may be different > than other people's idea of optimality, as we seek to optimize different > things. > Unrestrained, PCA could change a neuron's functionality based on new data, > and very likely wreck > a functioning NN's future operation by doing so. > > > > I suspect that some additional cleverness is needed, e.g. neurons initially > being in a "discovery mode" > that produces no output until a principal component (or something like a > principal component) > is discovered. Then, when downstream neurons use that principal component, > subsequent alteration > would be constrained to refining that component, with no possibility of > completely abandoning it > for a completely different component that might better represent the input. > > Any thoughts? > > Steve Richfield > ======================================= > On Sat, Dec 27, 2008 at 3:05 AM, Steve Richfield > <[email protected]> wrote: >> Abram, >> >> On 12/26/08, Abram Demski <[email protected]> wrote: >>> >>> Steve, >>> >>> When I made the statement about Fourier I was thinking of JPEG >>> encoding. A little digging found this book, which presents a unified >>> approach to (low-level) computer vision based on the Fourier >>> transform: >>> >>> >>> >>> http://books.google.com/books?id=1wJuTMbNT0MC&dq=fourier+vision&printsec=frontcover&source=bl&ots=3ogSJ2i5uW&sig=ZdvvWvu82q8UX1c5Abq6hWvgZCY&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPA4,M >> >> >> Interesting, but seems far removed from wet neuronal functionality, >> unsupervised learning, etc. >>> >>> >> But that is beside the present point. :) >>> > >>> > >>> > Probably so. I noticed that you recently graduated, so I thought that I >>> > would drop that thought to make (or unmake) your day. >>> >>> :) I should really update that. It's been a while now. >>> >>> >> generally, any transform that makes the data more sparse, or simpler, >>> >> seems good >>> > >>> > >>> > Certainly if it results in extracting some useful of merit. >>> > >>> >> >>> >> -- which is of course what PCA does, >>> > >>> > >>> > Sometimes yes, and sometimes no. I am looking at incremental PCA >>> > approaches >>> > that reliably extract separate figures of merit rather than >>> > smushed-together >>> > figures of merit as PCA often does. >>> >>> How do you define "figures of merit"? Sounds like an ill-defined >>> problem to me. We don't know which features we *really* want to >>> extract from an image until we know the utility function of the >>> environment, and so know what information will help us achieve our >>> goals. >> >> >> There are several views of this, e.g. >> 1. Pick something to recognize and see if back-propagation says that it >> is >> useful. >> In practice this has problems, because once a downstream neuron makes >> any tentative use, >> then changing an upstream neuron's functionality scrambles the >> downstream neuron's output. >> 2. Pick one of the most consistent, easily-recognizable, most >> information-containing things to recognize ala PCA, >> and expect downstream neurons to combine inputs to extract whatever >> they need through Bayesian logic. >> This may suffer from too many prospective things to recognize, most >> of >> which are not needed. >> >> These two methods look like they could fix each other's shortcomings, >> because good initial choices >> of figures of merit should then result in neurons either keeping their >> functionality or abandoning >> it, and thereby avoid the problems of changing functionality scrambling >> downstream neurons. >> This way, back-propagation could be used to select which upstream neurons >> need to find something >> else to do, but would have little/no impact on incremental "learning" as >> reward/punishment systems now do. >> >> My BIG challenge is that without something like dp/dt, unsupervised >> learning >> doesn't work. >> Now, with dp/dt it is a whole new game, and I have no idea where the >> threshold of real-world functionality lies. >> Hence, I seem to be forced into "pessimization"- making things as good as >> possible, >> even though I may be well past that threshold. >> >> Eddie's NN platform is able to tie into other applications, like flight >> simulator, web cams, etc. >> Hence, there is the whole Internet full of cameras to learn with, and it >> might be interesting to see if >> such a NN would be able to figure out how to fly a plane, maybe like >> Skinner's pidgeons . >> >> Thanks for your continuing thoughts. >> >> Steve Richfield >> ==================== >>> >>> On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield >>> <[email protected]> wrote: >>> > Abram, >>> > >>> > On 12/26/08, Abram Demski <[email protected]> wrote: >>> >> >>> >> Steve, >>> >> >>> >> It is strange to claim that prior PhDs will be worthless when what you >>> >> are suggesting is that we apply the standard methods to a different >>> >> representation. >>> > >>> > >>> > Much of AI and pretty much all of AGI is built on the proposition that >>> > we >>> > humans must code knowledge because the stupid machines can't >>> > efficiently >>> > learn it on their own, in short, that UNsupervised learning is >>> > difficult. >>> > Note that in nature, UNsupervised learning handily outperforms >>> > supervised >>> > learning. What good is supervised NN technology when UNsupervised NNs >>> > will >>> > perform MUCH better? What good are a few hand-coded AGI rules and the >>> > engine >>> > that runs them, when an UNsupervised AGI can learn them orders of >>> > magnitude >>> > faster than cities full of programmers? Note my prior post where I >>> > explain >>> > that either AGIs must either abandon UNsuperised learning, or switch to >>> > a >>> > NN-like implementation. In short, easy UNsupervised learning will >>> > change >>> > things about as much as the switch from horse and buggy to automobiles, >>> > leaving present PhDs in the position of blacksmiths and historians. >>> > Sure >>> > blacksmiths had transferrable skills, but they weren't worth much and >>> > they >>> > weren't respected at all. >>> > >>> > In the 1980s, countless top computer people (including myself) had to >>> > expunge all references to mainframe computers from our resumes in order >>> > to >>> > find work in a microcomputer-dominated field. I expect to see rounds of >>> > the >>> > same sort of insanity when UNsupervised learning emerges. >>> > >>> >> >>> >> But that is beside the present point. :) >>> > >>> > >>> > Probably so. I noticed that you recently graduated, so I thought that I >>> > would drop that thought to make (or unmake) your day. >>> > >>> >> >>> >> Taking the derivative, or just finite differences, is a useful step in >>> >> more ways then one. You are talking about taking differences over >>> >> time, but differences over space can used for edge detection, >>> >> frequently thought of as the first step in visual processing. >>> > >>> > >>> > Correct. My paper goes into using any dimension that is differentiable. >>> > Note >>> > that continuous eye movement converts a physical dimension to time >>> > domain. >>> > >>> >> >>> >> More >>> >> generally, any transform that makes the data more sparse, or simpler, >>> >> seems good >>> > >>> > >>> > Certainly if it results in extracting some useful of merit. >>> > >>> >> >>> >> -- which is of course what PCA does, >>> > >>> > >>> > Sometimes yes, and sometimes no. I am looking at incremental PCA >>> > approaches >>> > that reliably extract separate figures of merit rather than >>> > smushed-together >>> > figures of merit as PCA often does. Another problem with classical PCA >>> > is >>> > that it can't provide real-time learning, but instead, works via a sort >>> > of >>> > "batch processing" of statistics collected in the array that is being >>> > transformed. >>> > >>> >> >>> >> and derivatives in >>> >> time/space, and also the fourier transform I think. The usefulness of >>> >> these transforms springs from underlying regularities in the data. >>> > >>> > >>> > Hmmm, I don't see where a Fourier transform would enter the cognitive >>> > process. Perhaps you see something that I have missed? >>> > >>> >> >>> >> That's not to say that I don't think some representations are >>> >> fundamentally more useful than others-- for example, I know that some >>> >> proofs are astronomically larger in 1st-order logic as compared to >>> >> 2nd-order logic, even in domains where 1st-order logic is >>> >> representationally sufficient. >>> >> >>> >> The statement about time correction reminds me of a system called >>> >> PURR-PUSS. >>> > >>> > >>> > >>> > However, as I understand it, the Purposeful Unprimed Real-world Robot >>> > with >>> > Predictors Using Short Segments still relied on rewards and punishments >>> > for >>> > learning. >>> >> >>> >> It is turing-complete in some sense, essentially by >>> >> compounding time-delays, but I do not know exactly what sense (ie, a >>> >> turing complete *learner* is very different then a turing-complete >>> >> *programmable computer*... PURR PUSS uses something inbetween called >>> >> "soft teaching" if I recall correctly.) >>> > >>> > >>> > The old DEC LINC and LINC-8 computers operated the instruction >>> > sequencing >>> > with a pile of time delay modules, and someone had to go in and >>> > recalibrate >>> > every few months. >>> > >>> > Steve Richfield >>> > ================== >>> > >>> >> >>> >> On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield >>> >> <[email protected]> wrote: >>> >> > Abram, >>> >> > >>> >> > On 12/26/08, Abram Demski <[email protected]> wrote: >>> >> >> >>> >> >> Steve, >>> >> >> >>> >> >> Richard is right when he says temporal simultaneity is not a >>> >> >> sufficient principle. >>> >> > >>> >> > >>> >> > ... and I fully agree. However, we must unfold this thing one piece >>> >> > at a >>> >> > time. >>> >> > >>> >> > Without the dp/dt "trick", there doesn't seem to be any way to make >>> >> > unsupervised learning work, and I appear to be the first to stumble >>> >> > onto >>> >> > dp/dt. This is a whole new and unexplored world, where the things >>> >> > that >>> >> > stymied past unsupervised efforts fall out effortlessly, but some >>> >> > new >>> >> > challenges present themselves. >>> >> > >>> >> >> >>> >> >> Suppose you present your system with the >>> >> >> following sequences (letters could be substituted for sounds, >>> >> >> colors, >>> >> >> objects, whatever): >>> >> >> >>> >> >> ABCABCABCABC... >>> >> >> >>> >> >> AAABBBAAABBB... >>> >> >> >>> >> >> ABBAAABBBBAAAAABBBBBB... >>> >> >> >>> >> >> ABBCCCDDDDEEEEEFFFFFF... >>> >> >> >>> >> >> ABACABADABACABAEABACABADABACABA... >>> >> >> >>> >> >> All of these sequences have "concepts" behind them. All of these >>> >> >> concepts are immune to temporal-simultaneity-learning (although the >>> >> >> first could be learned by temporal adjacency, and the second by >>> >> >> temporal adjacency with a delay of 3). >>> >> > >>> >> > >>> >> > The way that wet neurons are built, this is unavoidable! Here is >>> >> > another >>> >> > snippet from my paper... >>> >> > >>> >> > >>> >> > Time Correction >>> >> > >>> >> > Electronics designers routinely use differentiation and integration >>> >> > to >>> >> > advance and retard timing. Phase-linear low-pass filters are often >>> >> > used >>> >> > to >>> >> > make short delays in a signal, and "peaking" capacitors were used in >>> >> > RTL >>> >> > (Resistor Transistor Logic) to differentiate inputs for quicker >>> >> > output. >>> >> > Further, wet neurons introduce their own propagation delays from >>> >> > input >>> >> > synapse to output synapse. If not somehow corrected, the net effect >>> >> > of >>> >> > this >>> >> > is a scrambling of the time that a given signal/node/term >>> >> > represents, >>> >> > which >>> >> > if left uncorrected, would result in relating signals together that >>> >> > are >>> >> > arbitrarily shifted in time. There seems to be three schools of >>> >> > thought >>> >> > regarding this: >>> >> > >>> >> > No problem. This simply results in considering various things >>> >> > shifted >>> >> > arbitrarily in time. When wet neurons learn what works, this will >>> >> > result >>> >> > in >>> >> > recognizing time-sequenced phenomena. Arbitrary delays might also do >>> >> > a >>> >> > lot >>> >> > for artificial neurons. >>> >> > >>> >> > Time correction could be instituted, e.g. through Taylor series >>> >> > signal >>> >> > extrapolation to in effect remove a neuron's delay, at the cost of >>> >> > introducing considerable noise into the result. My own simulations >>> >> > of >>> >> > Taylor >>> >> > series extrapolation functions showed that the first derivative may >>> >> > indeed >>> >> > help for small corrections, but beyond that, subtle changes in the >>> >> > shape >>> >> > of >>> >> > a transition cause wild changes in the extrapolated result, >>> >> > sometimes >>> >> > going >>> >> > so far as to produce short bursts of oscillation. Downstream neurons >>> >> > may >>> >> > then amplify these problems to produce havoc at the output of the >>> >> > artificial >>> >> > neural network. >>> >> > >>> >> > The method utilized in CRAY computers might be in use, where all >>> >> > delays >>> >> > were >>> >> > a precise multiple (of their clock rate) long. This was achieved by >>> >> > using >>> >> > interconnecting wires cut to certain specific lengths, even though >>> >> > the >>> >> > length may be much longer than actually physically needed to >>> >> > interconnect >>> >> > two components. Perhaps wet neurons only come in certain very >>> >> > specific >>> >> > delays. There is some laboratory evidence for this, as each section >>> >> > of >>> >> > our >>> >> > brains has neurons with similar geometry within the group. This has >>> >> > been >>> >> > presumed to be an artifact of evolution and limited DNA space, but >>> >> > may >>> >> > in >>> >> > fact be necessary for proper time correction. >>> >> > >>> >> > No one now knows which of these are in use in wet neurons. However, >>> >> > regardless of wet-neuron functionality, artificial neural network >>> >> > researchers should be attentive to time correction >>> >> > >>> >> > Note that #1 above unavoidably solves the time-sequencing puzzle. >>> >> > Introduce >>> >> > some integration, and the sequencing can be arbitrarily shifted in >>> >> > time >>> >> > - >>> >> > within reasonable limits (seconds, maybe a minute or two). >>> >> > >>> >> >> >>> >> >> The transition to sequence learning is (at least, in my eyes) a >>> >> >> transition to relational learning, as opposed to the "flat" >>> >> >> learning >>> >> >> that PCA is designed for. >>> >> > >>> >> > >>> >> > I suspect that PCA-like methods are at work within neurons, and that >>> >> > sequence learning and the like fall out from inter-neuronal >>> >> > connections >>> >> > and >>> >> > the associated delays, integration, etc. >>> >> >> >>> >> >> In other words, completely new methods are >>> >> >> required. You already begin that transition by invoking dp/dt, >>> >> >> which >>> >> >> assumes a temporal aspect to the data... >>> >> >> >>> >> >> See this blog post for a more full account of my view on the >>> >> >> current >>> >> >> state of affairs. (It started out as a post about a new algorithm >>> >> >> I'd >>> >> >> been thinking about, but turned into an essay on the difference >>> >> >> between relational methods and "flat" (propositional) methods, and >>> >> >> how >>> >> >> to bridge the gap. If you're wondering about the title, see the >>> >> >> previous post.) >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html >>> >> > >>> >> > >>> >> > This blog and this email reflect a common problem with AI-thought. >>> >> > There >>> >> > are >>> >> > LOTS of things that people are VERY bad at doing, and these >>> >> > generally >>> >> > make >>> >> > at the same time horrible examples to test human cognition theories >>> >> > on, >>> >> > and >>> >> > wonderful potential AI applications. >>> >> > >>> >> > A perfect example is health and disease, where the human cognition >>> >> > process >>> >> > tends to run in unproductive directions. Any given set of symptoms >>> >> > typically >>> >> > has ~12 different common underlying causal mechanisms, each of which >>> >> > has >>> >> > several cause-and-effect chain links that are typically arranged in >>> >> > a >>> >> > figure >>> >> > "6" configuration with a self-sustaining loop at the end. Given >>> >> > limitless >>> >> > understanding, it typically takes two seemingly unrelated actions to >>> >> > actually cure anything, one to stop the lead-in, and the other to >>> >> > momentarily interrupt the self-sustaining loop. >>> >> > >>> >> > It is my present suspicion that unsupervised learning is SO simple >>> >> > that >>> >> > it >>> >> > just falls out of a system using the right representation. Even the >>> >> > simplest >>> >> > of creatures do quite well at it. However, without that >>> >> > representation, >>> >> > it >>> >> > is horrifically hard/impossible. This means that NN and AGI guys >>> >> > should >>> >> > all >>> >> > STOP whatever they are doing and find the right representation, >>> >> > which >>> >> > is >>> >> > the >>> >> > path that I have gone on. >>> >> > >>> >> > Note that if I am successful, that prior PhDs in AI/NN won't be >>> >> > worth >>> >> > spit >>> >> > because they will be built on false premises. Good for history, but >>> >> > bad >>> >> > for >>> >> > science. >>> >> > >>> >> > Thanks for your thoughts. Any more? >>> >> > >>> >> > Steve Richfield >>> >> > ======================== >>> >> >> >>> >> >> On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield >>> >> >> <[email protected]> wrote: >>> >> >> > Richard,Richard, >>> >> >> > >>> >> >> > On 12/25/08, Richard Loosemore <[email protected]> wrote: >>> >> >> >> >>> >> >> >> Steve Richfield wrote: >>> >> >> >>> >>> >> >> >>> There are doubtless exceptions to my broad statement, but >>> >> >> >>> generally, >>> >> >> >>> neuron functionality is WIDE open to be pretty much ANYTHING >>> >> >> >>> you >>> >> >> >>> choose, >>> >> >> >>> including that of an AGI engine's functionality on its >>> >> >> >>> equations. >>> >> >> >>> In the reverse, any NN could be expressed in a shorthand form >>> >> >> >>> that >>> >> >> >>> contains structure, synapse functions, etc., and an AGI engine >>> >> >> >>> could >>> >> >> >>> be >>> >> >> >>> built/modified to function according to that shorthand. >>> >> >> >>> In short, mapping between NN and AGI forms presumes >>> >> >> >>> flexibility >>> >> >> >>> in >>> >> >> >>> the >>> >> >> >>> functionality of the target form. Where that flexibility is NOT >>> >> >> >>> present, >>> >> >> >>> e.g. because of orthogonal structure, etc., then you must ask >>> >> >> >>> whether >>> >> >> >>> something is being gained or lost by the difference. Clearly, >>> >> >> >>> any >>> >> >> >>> transition >>> >> >> >>> that involves a loss should be carefully examined to see if the >>> >> >> >>> entire >>> >> >> >>> effort is headed in the wrong direction, which I think was your >>> >> >> >>> original >>> >> >> >>> point here. >>> >> >> >> >>> >> >> >> >>> >> >> >> There is a problem here. >>> >> >> >> >>> >> >> >> When someone says "X and Y can easily be mapped from one form to >>> >> >> >> the >>> >> >> >> other" there is an implication that they are NOt suggesting that >>> >> >> >> we >>> >> >> >> go >>> >> >> >> right >>> >> >> >> down to the basic constituents of both X and Y in order to >>> >> >> >> effect >>> >> >> >> the >>> >> >> >> mapping. >>> >> >> >> >>> >> >> >> Thus: "Chalk and Cheese can easily be mapped from one to the >>> >> >> >> other" >>> >> >> >> .... >>> >> >> >> trivially true if we are prepared to go down to the common >>> >> >> >> denominator >>> >> >> >> of >>> >> >> >> electrons, protons and neutrons. But if we stay at a sensible >>> >> >> >> level >>> >> >> >> then, >>> >> >> >> no, these do not map onto one another. >>> >> >> > >>> >> >> > >>> >> >> > The problem here is that you were thinking present existing NN >>> >> >> > and >>> >> >> > AGI >>> >> >> > systems, neither of which work (yet) in any really useful way, >>> >> >> > that >>> >> >> > it >>> >> >> > was >>> >> >> > obviously impossible to directly convert from one system with its >>> >> >> > set >>> >> >> > of >>> >> >> > bad >>> >> >> > assumptions to another system with a completely different set of >>> >> >> > bad >>> >> >> > assumptions. I completely agree, but I assert that the answer to >>> >> >> > that >>> >> >> > particular question is of no practical interest to anyone. >>> >> >> > >>> >> >> > On the other hand, converting between NN and AGI systems built on >>> >> >> > the >>> >> >> > SAME >>> >> >> > set of assumptions would be simple. This situation doesn't yet >>> >> >> > exist. >>> >> >> > Until >>> >> >> > then, converting a program from one dysfunctional platform to >>> >> >> > another >>> >> >> > is >>> >> >> > uninteresting. When the assumptions get ironed out, then all >>> >> >> > systems >>> >> >> > will be >>> >> >> > built on the same assumptions, and there will be few problems >>> >> >> > going >>> >> >> > between >>> >> >> > them, EXCEPT: >>> >> >> > >>> >> >> > Things need to be arranged in arrays for automated learning, >>> >> >> > which >>> >> >> > much >>> >> >> > more >>> >> >> > fits the present NN paradigm than the present AGI paradigm. >>> >> >> >> >>> >> >> >> Similarly, if you claim that NN and regular AGI map onto one >>> >> >> >> another, I >>> >> >> >> assume that you are saying something more substantial than that >>> >> >> >> these >>> >> >> >> two >>> >> >> >> can both be broken down into their primitive computational >>> >> >> >> parts, >>> >> >> >> and >>> >> >> >> that >>> >> >> >> when this is done they seem equivalent. >>> >> >> > >>> >> >> > >>> >> >> > Even this breakdown isn't required if both systems are built on >>> >> >> > the >>> >> >> > same >>> >> >> > correct assumptions. HOWEVER, I see no way to transfer fast >>> >> >> > learning >>> >> >> > from an >>> >> >> > NN-like construction to an AGI-like construction. Do you? If >>> >> >> > there >>> >> >> > is >>> >> >> > no >>> >> >> > answer to this question, then this unanswerable question would >>> >> >> > seem >>> >> >> > to >>> >> >> > redirect AGI efforts to NN-like constructions if they are ever to >>> >> >> > learn >>> >> >> > like >>> >> >> > we do. >>> >> >> >> >>> >> >> >> NN and regular AGI, they way they are understood by people who >>> >> >> >> understand >>> >> >> >> them, have very different styles of constructing intelligent >>> >> >> >> systems. >>> >> >> > >>> >> >> > >>> >> >> > Neither of which work (yet). Of course, we are both trying to >>> >> >> > fill >>> >> >> > in >>> >> >> > the >>> >> >> > gaps. >>> >> >> >> >>> >> >> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to >>> >> >> >> trash >>> >> >> >> the >>> >> >> >> real meaning of "are easily mapped onto one another". >>> >> >> > >>> >> >> > >>> >> >> > One of my favorite consulting projects involved coding an AI >>> >> >> > program >>> >> >> > to >>> >> >> > solve complex problems that were roughly equivalent to solving >>> >> >> > algebraic >>> >> >> > equations. This composed the Yellow pages for 28 different large >>> >> >> > phone >>> >> >> > directories. The project was for a major phone company and had to >>> >> >> > be >>> >> >> > written >>> >> >> > entirely in COBOL. Further, it had to run at n log n speed and >>> >> >> > NOT >>> >> >> > n^2 >>> >> >> > speed, which I did by using successive sorts instead of list >>> >> >> > processing >>> >> >> > methods. It would have been rather difficult to achieve the >>> >> >> > needed >>> >> >> > performance in C or Lisp, even though COBOL would seem to be >>> >> >> > everyone's >>> >> >> > first choice as the last choice on the list of prospective >>> >> >> > platforms. >>> >> >> >>> >>> >> >> >>> ), instead of operating on "objects" (in an >>> >> >> >>> >>> >> >> >>> object-oriented sense) >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> Neither NN nor AGI has any intrinsic relationship to OO. >>> >> >> >>> >>> >> >> >>> Clearly I need a better term here. Both NNs and AGIs tend to >>> >> >> >>> have >>> >> >> >>> neurons or equations that reflect the presence (or absence) of >>> >> >> >>> various >>> >> >> >>> objects, conditions, actions, etc. My fundamental assertion is >>> >> >> >>> that >>> >> >> >>> if >>> >> >> >>> you >>> >> >> >>> differentiate the inputs so that everything in the entire >>> >> >> >>> network >>> >> >> >>> reflects >>> >> >> >>> dp/dt instead of straight probabilities, then the network works >>> >> >> >>> identically, >>> >> >> >>> but learning is GREATLY simplified. >>> >> >> >> >>> >> >> >> Seems like a simple misunderstanding: you were not aware that >>> >> >> >> "object >>> >> >> >> oriented" does not mean the same as saying that there are >>> >> >> >> fundamental >>> >> >> >> atomic >>> >> >> >> constituents of a representation. >>> >> >> > >>> >> >> > >>> >> >> > A typical semantic overloading problem. "Atomic consitituent >>> >> >> > orientation" >>> >> >> > doesn't really work either, because in later stages, individual >>> >> >> > terms/neurons can represent entire concepts, strategies, etc. I >>> >> >> > am >>> >> >> > still >>> >> >> > looking for a good term here. >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> , instead, operates on the rate-of-changes in the >>> >> >> >>> >>> >> >> >>> probabilities of "objects", or dp/dt. Presuming >>> >> >> >>> sufficient >>> >> >> >>> bandwidth to generally avoid superstitious coincidences, >>> >> >> >>> fast >>> >> >> >>> unsupervised learning then becomes completely trivial, >>> >> >> >>> as >>> >> >> >>> like >>> >> >> >>> objects cause simultaneous like-patterned changes in the >>> >> >> >>> inputs >>> >> >> >>> WITHOUT the overlapping effects of the many other >>> >> >> >>> objects >>> >> >> >>> typically present in the input (with numerous minor >>> >> >> >>> exceptions). >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> You have already presumed that something supplies the system >>> >> >> >>> with >>> >> >> >>> "objects" that are meaningful. Even before your first >>> >> >> >>> mention >>> >> >> >>> of >>> >> >> >>> dp/dt, there has to be a mechanism that is so good that it >>> >> >> >>> never >>> >> >> >>> invents objects such as: >>> >> >> >>> >>> >> >> >>> Object A: "A person who once watched all of Tuesday Welds >>> >> >> >>> movies >>> >> >> >>> in >>> >> >> >>> the space of one week" or >>> >> >> >>> >>> >> >> >>> Object B: "Something that is a combination of Julius >>> >> >> >>> Caesar's >>> >> >> >>> pinky >>> >> >> >>> toe and a sour grape that Brutus' just spat out" or >>> >> >> >>> >>> >> >> >>> Object C: "All of the molecules involved in a swiming gala >>> >> >> >>> that >>> >> >> >>> happen to be 17.36 meters from the last drop of water that >>> >> >> >>> splashed >>> >> >> >>> from the pool". >>> >> >> >>> >>> >> >> >>> You have supplied no mechanism that is able to do that, but >>> >> >> >>> that >>> >> >> >>> mechanism is 90% of the trouble, if learning is what you are >>> >> >> >>> about. >>> >> >> >>> >>> >> >> >>> With prior unsupervised learning you are 100% correct. However >>> >> >> >>> none >>> >> >> >>> of >>> >> >> >>> the examples you gave involved temporal simultaneity. I will >>> >> >> >>> discuss B >>> >> >> >>> above >>> >> >> >>> because it is close enough to be interesting. >>> >> >> >>> If indeed someone just began to notice something interesting >>> >> >> >>> about >>> >> >> >>> Caesar's pinkie toe *_as_* they just began to notice the taste >>> >> >> >>> of a >>> >> >> >>> sour >>> >> >> >>> grape, then yes, that probably would be leaned via the >>> >> >> >>> mechanisms I >>> >> >> >>> am >>> >> >> >>> talking about. However, if one was "present perfect tense" >>> >> >> >>> while >>> >> >> >>> the >>> >> >> >>> other >>> >> >> >>> was just beginning, then it wouldn't with my approach but would >>> >> >> >>> with >>> >> >> >>> prior >>> >> >> >>> unsupervised learning methods. For example, Caesar's pinkie toe >>> >> >> >>> had >>> >> >> >>> been >>> >> >> >>> noticed and examined, then before the condition passed they >>> >> >> >>> tasted >>> >> >> >>> a >>> >> >> >>> sour >>> >> >> >>> grape, then temporal simultaneity of the dp/dt edges wouldn't >>> >> >> >>> exist >>> >> >> >>> to >>> >> >> >>> learn >>> >> >> >>> from. Of course, in both cases, the transforms would work >>> >> >> >>> identically >>> >> >> >>> given >>> >> >> >>> identical prior learning/programming. >>> >> >> >> >>> >> >> >> >>> >> >> >> You have not understood the sense in which I made the point, I >>> >> >> >> fear. >>> >> >> > >>> >> >> > >>> >> >> > I think the reverse is true. Consider... >>> >> >> >> >>> >> >> >> I was describing obviously useless concepts. Ones where there >>> >> >> >> is >>> >> >> >> no >>> >> >> >> temporal simultaneity. >>> >> >> > >>> >> >> > >>> >> >> > dp/dt is unable to even notice things that lack temporal >>> >> >> > simultaneity, >>> >> >> > so >>> >> >> > the examples you gave, though typical challenges to past >>> >> >> > unsupervised >>> >> >> > learning, and complete non-issues in dp/dt space. >>> >> >> >> >>> >> >> >> Concepts thrown together out of completely useless components. >>> >> >> > >>> >> >> > >>> >> >> > ... that require SOME force/reason/bug/error/etc to get thrown >>> >> >> > together. >>> >> >> > I >>> >> >> > think we both understand how this was a typical challenge to past >>> >> >> > unsupervised learning efforts. I am asserting that in dp/dt >>> >> >> > systems, >>> >> >> > there >>> >> >> > is NO force/reason/error/etc to ever throw such things together, >>> >> >> > and >>> >> >> > hence, >>> >> >> > no reason for vastly complex matrix transforms to then try to >>> >> >> > pull >>> >> >> > them >>> >> >> > back >>> >> >> > apart. >>> >> >> >> >>> >> >> >> The question is: how to build a mechanism that does NOT fall >>> >> >> >> into >>> >> >> >> the >>> >> >> >> trap of creating such nonsense-concepts. If you just say >>> >> >> >> "assume >>> >> >> >> that >>> >> >> >> we >>> >> >> >> have such a concept builder" you beg a million questions. >>> >> >> > >>> >> >> > >>> >> >> >> Your reply, above, took one of my examples and tried to talk >>> >> >> >> about >>> >> >> >> what >>> >> >> >> could happen if it was not, after all, a nonsense-concept. >>> >> >> > >>> >> >> > Note that GMAIL got sick here, so I'll mark your text with >. >>> >> >> > Also, >>> >> >> > some >>> >> >> > replies are deeply indented, so I have bolded some of them. >>> >> >> > >>> >> >> > I was just "playing neuron" without any mindreading abilities. >>> >> >> >>Alas, that is neither here nor there, because (sure enough) >>> >> >> >> *everyone* >>> >> >> >> agrees that temporal simultaneity is a good basic ground for >>> >> >> >> trying >>> >> >> >> to >>> >> >> >> construct new concepts (it is the Reason Number One for creating >>> >> >> >> a >>> >> >> >> new >>> >> >> >> concept!). But we also know that just common or garden variety >>> >> >> >> Temporal >>> >> >> >> Simultaneity doesn't get you very far .... that is the easiest >>> >> >> >> of >>> >> >> >> all >>> >> >> >> mechanisms, and we need a hundred more concept-building >>> >> >> >> mechanisms >>> >> >> >> that >>> >> >> >> are >>> >> >> >> better than that before we have a real concept-generating >>> >> >> >> engine. >>> >> >> > >>> >> >> > Now, we can start "picking through" the approaches. I suspect >>> >> >> > that >>> >> >> > looking >>> >> >> > for the principal components of temporally simultaneous inputs >>> >> >> > goes a >>> >> >> > LONG >>> >> >> > was toward what is sought, but have no proof (yet). Do YOU have >>> >> >> > some >>> >> >> > idea as >>> >> >> > to where the threshold of usefullness is? >>> >> >> > >>> >> >> >> And (here is where my point comes back into the picture) if >>> >> >> >> anyone >>> >> >> >> stands >>> >> >> >> up and says "Hey everyone! I have discovered a hundred concept >>> >> >> >> building >>> >> >> >> mechanisms that I think will do the trick!", the first question >>> >> >> >> that >>> >> >> >> the >>> >> >> >> crowd will ask is: "Do your mechanisms work together to build >>> >> >> >> real, >>> >> >> >> sensible concepts, or do they fill the system with bazillions of >>> >> >> >> really >>> >> >> >> dumb, useless concepts (like my nonsense list above)?" >>> >> >> > >>> >> >> > Clearly, PCA on simultaneous inputs will NOT do that, because >>> >> >> > they >>> >> >> > must >>> >> >> > show >>> >> >> > common things in order not to end up at the wrong end of the >>> >> >> > Huffman >>> >> >> > code. >>> >> >> > >>> >> >> >> Anyone who says that they know of a way to get unsupervised >>> >> >> >> learning >>> >> >> >> to >>> >> >> >> occur is saying, implicitly, that they have those 100 concept >>> >> >> >> building >>> >> >> >> mechanisms ready to go (or one super mechanism as good as all of >>> >> >> >> them). >>> >> >> >> Hence my original point: you cannot simply imply that your >>> >> >> >> system >>> >> >> >> is >>> >> >> >> working with bona-fide, coherent concepts unless you can show >>> >> >> >> that >>> >> >> >> it >>> >> >> >> really >>> >> >> >> does come up with concepts (or objects) that are sensible. >>> >> >> > >>> >> >> > Perhaps you could exhibit some examples where learning based on >>> >> >> > temporal >>> >> >> > simultaneity with a preference for identifying common patterns >>> >> >> > (as >>> >> >> > PCA >>> >> >> > requires) fails. Clearly, if I think that a relatively simple >>> >> >> > approach >>> >> >> > (like >>> >> >> > PCA on dp/dt inputs) should work, but you are convinced that it >>> >> >> > will >>> >> >> > fall >>> >> >> > into an abyss of superstitious learning, then you will have a >>> >> >> > MUCH >>> >> >> > easier >>> >> >> > time exhibiting a couple of example failures than I will have >>> >> >> > somehow >>> >> >> > proving that it always works (which is probably beyond the >>> >> >> > mathematical >>> >> >> > state of the art). >>> >> >> > >>> >> >> > I'm not saying your are wrong here, only that you may not have >>> >> >> > heard >>> >> >> > me >>> >> >> > (probably my fault for not saying things clearly enough), and you >>> >> >> > haven't >>> >> >> > made your point by exhibiting something on which my approach >>> >> >> > would >>> >> >> > fail. >>> >> >> > >>> >> >> >> FWIW, I would level the same criticism against quite a few other >>> >> >> >> people, >>> >> >> >> so you don't stand alone here. >>> >> >> > >>> >> >> > My ego is quite indestructable and I understand that your body >>> >> >> > temperature >>> >> >> > is low, so you have nothing to worry about here. >>> >> >> >> (Just briefly: if I move on to look at your actual reply above, >>> >> >> >> I >>> >> >> >> see >>> >> >> >> also mention of rates of change (dp/dt), but no explanation of >>> >> >> >> how >>> >> >> >> rates of >>> >> >> >> change of anything would help a system build a concept that is a >>> >> >> >> combination >>> >> >> >> (NOT an association, please!) of [Julius Caesar's pinky toe and >>> >> >> >> a >>> >> >> >> sour >>> >> >> >> grape >>> >> >> >> that Brutus' just spat out]. The rates of change seem >>> >> >> >> irrelevant >>> >> >> >> here). >>> >> >> > >>> >> >> > If you take a neuron or Bayesian formula programmed to do >>> >> >> > something >>> >> >> > static >>> >> >> > and throw dp/dt inputs at it, its output will be the dp/dt of the >>> >> >> > result >>> >> >> > from static operation. You could then simply integrate it to >>> >> >> > produce >>> >> >> > exactly >>> >> >> > the same output. Hence, the ONLY reason to operate in dp/dt space >>> >> >> > is >>> >> >> > for >>> >> >> > the >>> >> >> > learning, as the transformation itself is unaffected. >>> >> >> > >>> >> >> > Now, if you look for an association in dp/dt space and decide to >>> >> >> > recognize >>> >> >> > it, that same neuron with then operate to recognize a >>> >> >> > combination, >>> >> >> > once >>> >> >> > its >>> >> >> > output has been integrated. Of course, not integrating but simply >>> >> >> > using >>> >> >> > its >>> >> >> > output by subsequent neurons, the entire system will operate as >>> >> >> > though >>> >> >> > it >>> >> >> > recognized the combination, even though, if you attached an >>> >> >> > oscilloscope >>> >> >> > to >>> >> >> > the output, you would see positive and negative spikes around >>> >> >> > what >>> >> >> > would >>> >> >> > be >>> >> >> > a steady-state output in "object" mode. >>> >> >> > >>> >> >> > In short, it programs based on associations, but functions based >>> >> >> > on >>> >> >> > represented combinations, that representation being the dp/dt of >>> >> >> > the >>> >> >> > combination. >>> >> >> > >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> Instead, you waved your hands and said "fast unsupervised >>> >> >> >>> learning >>> >> >> >>> > then becomes completely trivial" .... this statement is a >>> >> >> >>> declaration that a good mechanism is available. >>> >> >> >>> >>> >> >> >>> You then also talk about "like" objects. But the whole >>> >> >> >>> concept >>> >> >> >>> of >>> >> >> >>> "like" is extraordinarily troublesome. Are Julius Caesar >>> >> >> >>> and >>> >> >> >>> Brutus >>> >> >> >>> "like" each other? Seen from our distance, maybe yes, but >>> >> >> >>> from >>> >> >> >>> the >>> >> >> >>> point of view of Julius C., probably not so much. Is a >>> >> >> >>> G-type >>> >> >> >>> star >>> >> >> >>> "like" a mirror? I don't know any stellar astrophysicists >>> >> >> >>> who >>> >> >> >>> would >>> >> >> >>> say so, but then again OF COURSE they are, because they are >>> >> >> >>> almost >>> >> >> >>> indistinguishable, because if you hold a mirror up in the >>> >> >> >>> right >>> >> >> >>> way >>> >> >> >>> it can reflect the sun and the two visual images can be >>> >> >> >>> identical. >>> >> >> >>> >>> >> >> >>> These questions can be resolved, sure enough, but it is the >>> >> >> >>> whole >>> >> >> >>> business of resolving these questions (rather than waving a >>> >> >> >>> hand >>> >> >> >>> over them and declaring them to be trivial) that is the >>> >> >> >>> point. >>> >> >> >>> >>> >> >> >>> I think that pretty much everyone everyone who has "dented >>> >> >> >>> their >>> >> >> >>> pick" >>> >> >> >>> on unsupervised learning (this includes myself. Does anyone >>> >> >> >>> else >>> >> >> >>> here >>> >> >> >>> have >>> >> >> >>> these same scars?) has developed methods that would work on >>> >> >> >>> "completely >>> >> >> >>> obvious" test cases but failed miserably on real-world input. >>> >> >> >>> My >>> >> >> >>> point >>> >> >> >>> here >>> >> >> >>> is that looking at things from a dp/dt point of view, >>> >> >> >>> real-world >>> >> >> >>> situations >>> >> >> >>> become about as simple as "completely obvious" test cases. >>> >> >> >>> I would quote some good source to make this point, but I don't >>> >> >> >>> think >>> >> >> >>> anyone has gone here yet. >>> >> >> >> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> If you don't have a clear demonstration that this dp/dt idea >>> >> >> >>> does >>> >> >> >>> deliver >>> >> >> >>> the goods, why are you claiming that it does? Surely it is one >>> >> >> >>> or >>> >> >> >>> the >>> >> >> >>> other? >>> >> >> >>> >>> >> >> >>> This month I am wearing my mathematician hat. My son Eddie is >>> >> >> >>> the >>> >> >> >>> NN >>> >> >> >>> hacker of the family, and he is waiting impatiently for me to >>> >> >> >>> declare >>> >> >> >>> a >>> >> >> >>> tentative completion so he can run with it. >>> >> >> >>> >>> >> >> >>> For now, my goal is to come up with sufficiently good theory >>> >> >> >>> that >>> >> >> >>> even >>> >> >> >>> you can't poke any significant holes in it. Once I become the >>> >> >> >>> first >>> >> >> >>> person >>> >> >> >>> in history to ever receive the the Loosemore Seal of No >>> >> >> >>> Objection, >>> >> >> >>> I >>> >> >> >>> will >>> >> >> >>> probably wrap this thing up and turn it over to Eddie. >>> >> >> >>>> >>> >> >> >>>> But Steve, if YOU claim that "looking at things from a dp/dt >>> >> >> >>>> point >>> >> >> >>>> of >>> >> >> >>>> view" does in fact yield a dramatic breakthrough that allows >>> >> >> >>>> unsupervised >>> >> >> >>>> learning to work on real world cases (something nobody else >>> >> >> >>>> can >>> >> >> >>>> do >>> >> >> >>>> right >>> >> >> >>>> now), >>> >> >> >>>> >>> >> >> >>>> Not entirely true, as PCA does what could be considered to be >>> >> >> >>>> unsupervised learning, though granted, it is WAY too >>> >> >> >>>> inefficient >>> >> >> >>>> for >>> >> >> >>>> NN/AGI >>> >> >> >>>> use without dp/dt. >>> >> >> >>>>> >>> >> >> >>>>> then YOU are expected to be the one who has gone there, done >>> >> >> >>>>> it, >>> >> >> >>>>> and >>> >> >> >>>>> come back with evidence that your idea does in fact do that. >>> >> >> >> >>> >> >> >> First comes the theory, then comes the demo. Neither contains >>> >> >> >> any >>> >> >> >> sort >>> >> >> >> of >>> >> >> >> proof, but it is a LOT cheaper to shoot something down BEFORE it >>> >> >> >> is >>> >> >> >> built >>> >> >> >> than after. Hence, I find this exercise VERY valuable. THANKS. >>> >> >> >> Please >>> >> >> >> keep >>> >> >> >> up the good work. >>> >> >> >> >>> >> >> >> Steve Richield >>> >> >> > >>> >> >> > ________________________________ >>> >> >> > agi | Archives | Modify Your Subscription >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Abram Demski >>> >> >> Public address: [email protected] >>> >> >> Public archive: http://groups.google.com/group/abram-demski >>> >> >> Private address: [email protected] >>> >> >> >>> >> >> >>> >> >> ------------------------------------------- >>> >> >> agi >>> >> >> Archives: https://www.listbox.com/member/archive/303/=now >>> >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >>> >> >> Modify Your Subscription: https://www.listbox.com/member/?& >>> >> >> Powered by Listbox: http://www.listbox.com >>> >> > >>> >> > ________________________________ >>> >> > agi | Archives | Modify Your Subscription >>> >> >>> >> >>> >> >>> >> -- >>> >> Abram Demski >>> >> Public address: [email protected] >>> >> Public archive: http://groups.google.com/group/abram-demski >>> >> Private address: [email protected] >>> >> >>> >> >>> >> ------------------------------------------- >>> >> agi >>> >> Archives: https://www.listbox.com/member/archive/303/=now >>> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >>> >> Modify Your Subscription: https://www.listbox.com/member/?& >>> >> Powered by Listbox: http://www.listbox.com >>> > >>> > ________________________________ >>> > agi | Archives | Modify Your Subscription >>> >>> >>> >>> -- >>> Abram Demski >>> Public address: [email protected] >>> Public archive: http://groups.google.com/group/abram-demski >>> Private address: [email protected] >>> >>> >>> ------------------------------------------- >>> agi >>> Archives: https://www.listbox.com/member/archive/303/=now >>> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >>> Modify Your Subscription: https://www.listbox.com/member/?& >>> Powered by Listbox: http://www.listbox.com >> >> ________________________________ >> agi | Archives | Modify Your Subscription > > > > -- > Abram Demski > Public address: [email protected] > Public archive: http://groups.google.com/group/abram-demski > Private address: [email protected] > > > ------------------------------------------- > agi > Archives: https://www.listbox.com/member/archive/303/=now > RSS Feed: https://www.listbox.com/member/archive/rss/303/ > Modify Your Subscription: https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com > > ________________________________ > agi | Archives | Modify Your Subscription -- Abram Demski Public address: [email protected] Public archive: http://groups.google.com/group/abram-demski Private address: [email protected] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
