Steve, When I made the statement about Fourier I was thinking of JPEG encoding. A little digging found this book, which presents a unified approach to (low-level) computer vision based on the Fourier transform:
http://books.google.com/books?id=1wJuTMbNT0MC&dq=fourier+vision&printsec=frontcover&source=bl&ots=3ogSJ2i5uW&sig=ZdvvWvu82q8UX1c5Abq6hWvgZCY&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPA4,M1 >> But that is beside the present point. :) > > > Probably so. I noticed that you recently graduated, so I thought that I > would drop that thought to make (or unmake) your day. :) I should really update that. It's been a while now. >> generally, any transform that makes the data more sparse, or simpler, >> seems good > > > Certainly if it results in extracting some useful of merit. > >> >> -- which is of course what PCA does, > > > Sometimes yes, and sometimes no. I am looking at incremental PCA approaches > that reliably extract separate figures of merit rather than smushed-together > figures of merit as PCA often does. How do you define "figures of merit"? Sounds like an ill-defined problem to me. We don't know which features we *really* want to extract from an image until we know the utility function of the environment, and so know what information will help us achieve our goals. --Abram On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield <[email protected]> wrote: > Abram, > > On 12/26/08, Abram Demski <[email protected]> wrote: >> >> Steve, >> >> It is strange to claim that prior PhDs will be worthless when what you >> are suggesting is that we apply the standard methods to a different >> representation. > > > Much of AI and pretty much all of AGI is built on the proposition that we > humans must code knowledge because the stupid machines can't efficiently > learn it on their own, in short, that UNsupervised learning is difficult. > Note that in nature, UNsupervised learning handily outperforms supervised > learning. What good is supervised NN technology when UNsupervised NNs will > perform MUCH better? What good are a few hand-coded AGI rules and the engine > that runs them, when an UNsupervised AGI can learn them orders of magnitude > faster than cities full of programmers? Note my prior post where I explain > that either AGIs must either abandon UNsuperised learning, or switch to a > NN-like implementation. In short, easy UNsupervised learning will change > things about as much as the switch from horse and buggy to automobiles, > leaving present PhDs in the position of blacksmiths and historians. Sure > blacksmiths had transferrable skills, but they weren't worth much and they > weren't respected at all. > > In the 1980s, countless top computer people (including myself) had to > expunge all references to mainframe computers from our resumes in order to > find work in a microcomputer-dominated field. I expect to see rounds of the > same sort of insanity when UNsupervised learning emerges. > >> >> But that is beside the present point. :) > > > Probably so. I noticed that you recently graduated, so I thought that I > would drop that thought to make (or unmake) your day. > >> >> Taking the derivative, or just finite differences, is a useful step in >> more ways then one. You are talking about taking differences over >> time, but differences over space can used for edge detection, >> frequently thought of as the first step in visual processing. > > > Correct. My paper goes into using any dimension that is differentiable. Note > that continuous eye movement converts a physical dimension to time domain. > >> >> More >> generally, any transform that makes the data more sparse, or simpler, >> seems good > > > Certainly if it results in extracting some useful of merit. > >> >> -- which is of course what PCA does, > > > Sometimes yes, and sometimes no. I am looking at incremental PCA approaches > that reliably extract separate figures of merit rather than smushed-together > figures of merit as PCA often does. Another problem with classical PCA is > that it can't provide real-time learning, but instead, works via a sort of > "batch processing" of statistics collected in the array that is being > transformed. > >> >> and derivatives in >> time/space, and also the fourier transform I think. The usefulness of >> these transforms springs from underlying regularities in the data. > > > Hmmm, I don't see where a Fourier transform would enter the cognitive > process. Perhaps you see something that I have missed? > >> >> That's not to say that I don't think some representations are >> fundamentally more useful than others-- for example, I know that some >> proofs are astronomically larger in 1st-order logic as compared to >> 2nd-order logic, even in domains where 1st-order logic is >> representationally sufficient. >> >> The statement about time correction reminds me of a system called >> PURR-PUSS. > > > > However, as I understand it, the Purposeful Unprimed Real-world Robot with > Predictors Using Short Segments still relied on rewards and punishments for > learning. >> >> It is turing-complete in some sense, essentially by >> compounding time-delays, but I do not know exactly what sense (ie, a >> turing complete *learner* is very different then a turing-complete >> *programmable computer*... PURR PUSS uses something inbetween called >> "soft teaching" if I recall correctly.) > > > The old DEC LINC and LINC-8 computers operated the instruction sequencing > with a pile of time delay modules, and someone had to go in and recalibrate > every few months. > > Steve Richfield > ================== > >> >> On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield >> <[email protected]> wrote: >> > Abram, >> > >> > On 12/26/08, Abram Demski <[email protected]> wrote: >> >> >> >> Steve, >> >> >> >> Richard is right when he says temporal simultaneity is not a >> >> sufficient principle. >> > >> > >> > ... and I fully agree. However, we must unfold this thing one piece at a >> > time. >> > >> > Without the dp/dt "trick", there doesn't seem to be any way to make >> > unsupervised learning work, and I appear to be the first to stumble onto >> > dp/dt. This is a whole new and unexplored world, where the things that >> > stymied past unsupervised efforts fall out effortlessly, but some new >> > challenges present themselves. >> > >> >> >> >> Suppose you present your system with the >> >> following sequences (letters could be substituted for sounds, colors, >> >> objects, whatever): >> >> >> >> ABCABCABCABC... >> >> >> >> AAABBBAAABBB... >> >> >> >> ABBAAABBBBAAAAABBBBBB... >> >> >> >> ABBCCCDDDDEEEEEFFFFFF... >> >> >> >> ABACABADABACABAEABACABADABACABA... >> >> >> >> All of these sequences have "concepts" behind them. All of these >> >> concepts are immune to temporal-simultaneity-learning (although the >> >> first could be learned by temporal adjacency, and the second by >> >> temporal adjacency with a delay of 3). >> > >> > >> > The way that wet neurons are built, this is unavoidable! Here is another >> > snippet from my paper... >> > >> > >> > Time Correction >> > >> > Electronics designers routinely use differentiation and integration to >> > advance and retard timing. Phase-linear low-pass filters are often used >> > to >> > make short delays in a signal, and "peaking" capacitors were used in RTL >> > (Resistor Transistor Logic) to differentiate inputs for quicker output. >> > Further, wet neurons introduce their own propagation delays from input >> > synapse to output synapse. If not somehow corrected, the net effect of >> > this >> > is a scrambling of the time that a given signal/node/term represents, >> > which >> > if left uncorrected, would result in relating signals together that are >> > arbitrarily shifted in time. There seems to be three schools of thought >> > regarding this: >> > >> > No problem. This simply results in considering various things shifted >> > arbitrarily in time. When wet neurons learn what works, this will result >> > in >> > recognizing time-sequenced phenomena. Arbitrary delays might also do a >> > lot >> > for artificial neurons. >> > >> > Time correction could be instituted, e.g. through Taylor series signal >> > extrapolation to in effect remove a neuron's delay, at the cost of >> > introducing considerable noise into the result. My own simulations of >> > Taylor >> > series extrapolation functions showed that the first derivative may >> > indeed >> > help for small corrections, but beyond that, subtle changes in the shape >> > of >> > a transition cause wild changes in the extrapolated result, sometimes >> > going >> > so far as to produce short bursts of oscillation. Downstream neurons may >> > then amplify these problems to produce havoc at the output of the >> > artificial >> > neural network. >> > >> > The method utilized in CRAY computers might be in use, where all delays >> > were >> > a precise multiple (of their clock rate) long. This was achieved by >> > using >> > interconnecting wires cut to certain specific lengths, even though the >> > length may be much longer than actually physically needed to >> > interconnect >> > two components. Perhaps wet neurons only come in certain very specific >> > delays. There is some laboratory evidence for this, as each section of >> > our >> > brains has neurons with similar geometry within the group. This has been >> > presumed to be an artifact of evolution and limited DNA space, but may >> > in >> > fact be necessary for proper time correction. >> > >> > No one now knows which of these are in use in wet neurons. However, >> > regardless of wet-neuron functionality, artificial neural network >> > researchers should be attentive to time correction >> > >> > Note that #1 above unavoidably solves the time-sequencing puzzle. >> > Introduce >> > some integration, and the sequencing can be arbitrarily shifted in time >> > - >> > within reasonable limits (seconds, maybe a minute or two). >> > >> >> >> >> The transition to sequence learning is (at least, in my eyes) a >> >> transition to relational learning, as opposed to the "flat" learning >> >> that PCA is designed for. >> > >> > >> > I suspect that PCA-like methods are at work within neurons, and that >> > sequence learning and the like fall out from inter-neuronal connections >> > and >> > the associated delays, integration, etc. >> >> >> >> In other words, completely new methods are >> >> required. You already begin that transition by invoking dp/dt, which >> >> assumes a temporal aspect to the data... >> >> >> >> See this blog post for a more full account of my view on the current >> >> state of affairs. (It started out as a post about a new algorithm I'd >> >> been thinking about, but turned into an essay on the difference >> >> between relational methods and "flat" (propositional) methods, and how >> >> to bridge the gap. If you're wondering about the title, see the >> >> previous post.) >> >> >> >> >> >> http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html >> > >> > >> > This blog and this email reflect a common problem with AI-thought. There >> > are >> > LOTS of things that people are VERY bad at doing, and these generally >> > make >> > at the same time horrible examples to test human cognition theories on, >> > and >> > wonderful potential AI applications. >> > >> > A perfect example is health and disease, where the human cognition >> > process >> > tends to run in unproductive directions. Any given set of symptoms >> > typically >> > has ~12 different common underlying causal mechanisms, each of which has >> > several cause-and-effect chain links that are typically arranged in a >> > figure >> > "6" configuration with a self-sustaining loop at the end. Given >> > limitless >> > understanding, it typically takes two seemingly unrelated actions to >> > actually cure anything, one to stop the lead-in, and the other to >> > momentarily interrupt the self-sustaining loop. >> > >> > It is my present suspicion that unsupervised learning is SO simple that >> > it >> > just falls out of a system using the right representation. Even the >> > simplest >> > of creatures do quite well at it. However, without that representation, >> > it >> > is horrifically hard/impossible. This means that NN and AGI guys should >> > all >> > STOP whatever they are doing and find the right representation, which is >> > the >> > path that I have gone on. >> > >> > Note that if I am successful, that prior PhDs in AI/NN won't be worth >> > spit >> > because they will be built on false premises. Good for history, but bad >> > for >> > science. >> > >> > Thanks for your thoughts. Any more? >> > >> > Steve Richfield >> > ======================== >> >> >> >> On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield >> >> <[email protected]> wrote: >> >> > Richard,Richard, >> >> > >> >> > On 12/25/08, Richard Loosemore <[email protected]> wrote: >> >> >> >> >> >> Steve Richfield wrote: >> >> >>> >> >> >>> There are doubtless exceptions to my broad statement, but >> >> >>> generally, >> >> >>> neuron functionality is WIDE open to be pretty much ANYTHING you >> >> >>> choose, >> >> >>> including that of an AGI engine's functionality on its equations. >> >> >>> In the reverse, any NN could be expressed in a shorthand form that >> >> >>> contains structure, synapse functions, etc., and an AGI engine >> >> >>> could >> >> >>> be >> >> >>> built/modified to function according to that shorthand. >> >> >>> In short, mapping between NN and AGI forms presumes flexibility in >> >> >>> the >> >> >>> functionality of the target form. Where that flexibility is NOT >> >> >>> present, >> >> >>> e.g. because of orthogonal structure, etc., then you must ask >> >> >>> whether >> >> >>> something is being gained or lost by the difference. Clearly, any >> >> >>> transition >> >> >>> that involves a loss should be carefully examined to see if the >> >> >>> entire >> >> >>> effort is headed in the wrong direction, which I think was your >> >> >>> original >> >> >>> point here. >> >> >> >> >> >> >> >> >> There is a problem here. >> >> >> >> >> >> When someone says "X and Y can easily be mapped from one form to the >> >> >> other" there is an implication that they are NOt suggesting that we >> >> >> go >> >> >> right >> >> >> down to the basic constituents of both X and Y in order to effect >> >> >> the >> >> >> mapping. >> >> >> >> >> >> Thus: "Chalk and Cheese can easily be mapped from one to the other" >> >> >> .... >> >> >> trivially true if we are prepared to go down to the common >> >> >> denominator >> >> >> of >> >> >> electrons, protons and neutrons. But if we stay at a sensible level >> >> >> then, >> >> >> no, these do not map onto one another. >> >> > >> >> > >> >> > The problem here is that you were thinking present existing NN and >> >> > AGI >> >> > systems, neither of which work (yet) in any really useful way, that >> >> > it >> >> > was >> >> > obviously impossible to directly convert from one system with its set >> >> > of >> >> > bad >> >> > assumptions to another system with a completely different set of bad >> >> > assumptions. I completely agree, but I assert that the answer to that >> >> > particular question is of no practical interest to anyone. >> >> > >> >> > On the other hand, converting between NN and AGI systems built on the >> >> > SAME >> >> > set of assumptions would be simple. This situation doesn't yet exist. >> >> > Until >> >> > then, converting a program from one dysfunctional platform to another >> >> > is >> >> > uninteresting. When the assumptions get ironed out, then all systems >> >> > will be >> >> > built on the same assumptions, and there will be few problems going >> >> > between >> >> > them, EXCEPT: >> >> > >> >> > Things need to be arranged in arrays for automated learning, which >> >> > much >> >> > more >> >> > fits the present NN paradigm than the present AGI paradigm. >> >> >> >> >> >> Similarly, if you claim that NN and regular AGI map onto one >> >> >> another, I >> >> >> assume that you are saying something more substantial than that >> >> >> these >> >> >> two >> >> >> can both be broken down into their primitive computational parts, >> >> >> and >> >> >> that >> >> >> when this is done they seem equivalent. >> >> > >> >> > >> >> > Even this breakdown isn't required if both systems are built on the >> >> > same >> >> > correct assumptions. HOWEVER, I see no way to transfer fast learning >> >> > from an >> >> > NN-like construction to an AGI-like construction. Do you? If there is >> >> > no >> >> > answer to this question, then this unanswerable question would seem >> >> > to >> >> > redirect AGI efforts to NN-like constructions if they are ever to >> >> > learn >> >> > like >> >> > we do. >> >> >> >> >> >> NN and regular AGI, they way they are understood by people who >> >> >> understand >> >> >> them, have very different styles of constructing intelligent >> >> >> systems. >> >> > >> >> > >> >> > Neither of which work (yet). Of course, we are both trying to fill in >> >> > the >> >> > gaps. >> >> >> >> >> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to >> >> >> trash >> >> >> the >> >> >> real meaning of "are easily mapped onto one another". >> >> > >> >> > >> >> > One of my favorite consulting projects involved coding an AI program >> >> > to >> >> > solve complex problems that were roughly equivalent to solving >> >> > algebraic >> >> > equations. This composed the Yellow pages for 28 different large >> >> > phone >> >> > directories. The project was for a major phone company and had to be >> >> > written >> >> > entirely in COBOL. Further, it had to run at n log n speed and NOT >> >> > n^2 >> >> > speed, which I did by using successive sorts instead of list >> >> > processing >> >> > methods. It would have been rather difficult to achieve the needed >> >> > performance in C or Lisp, even though COBOL would seem to be >> >> > everyone's >> >> > first choice as the last choice on the list of prospective platforms. >> >> >>> >> >> >>> ), instead of operating on "objects" (in an >> >> >>> >> >> >>> object-oriented sense) >> >> >>> >> >> >>> >> >> >>> Neither NN nor AGI has any intrinsic relationship to OO. >> >> >>> >> >> >>> Clearly I need a better term here. Both NNs and AGIs tend to have >> >> >>> neurons or equations that reflect the presence (or absence) of >> >> >>> various >> >> >>> objects, conditions, actions, etc. My fundamental assertion is that >> >> >>> if >> >> >>> you >> >> >>> differentiate the inputs so that everything in the entire network >> >> >>> reflects >> >> >>> dp/dt instead of straight probabilities, then the network works >> >> >>> identically, >> >> >>> but learning is GREATLY simplified. >> >> >> >> >> >> Seems like a simple misunderstanding: you were not aware that >> >> >> "object >> >> >> oriented" does not mean the same as saying that there are >> >> >> fundamental >> >> >> atomic >> >> >> constituents of a representation. >> >> > >> >> > >> >> > A typical semantic overloading problem. "Atomic consitituent >> >> > orientation" >> >> > doesn't really work either, because in later stages, individual >> >> > terms/neurons can represent entire concepts, strategies, etc. I am >> >> > still >> >> > looking for a good term here. >> >> >>> >> >> >>> >> >> >>> , instead, operates on the rate-of-changes in the >> >> >>> >> >> >>> probabilities of "objects", or dp/dt. Presuming sufficient >> >> >>> bandwidth to generally avoid superstitious coincidences, >> >> >>> fast >> >> >>> unsupervised learning then becomes completely trivial, as >> >> >>> like >> >> >>> objects cause simultaneous like-patterned changes in the >> >> >>> inputs >> >> >>> WITHOUT the overlapping effects of the many other objects >> >> >>> typically present in the input (with numerous minor >> >> >>> exceptions). >> >> >>> >> >> >>> >> >> >>> You have already presumed that something supplies the system >> >> >>> with >> >> >>> "objects" that are meaningful. Even before your first mention >> >> >>> of >> >> >>> dp/dt, there has to be a mechanism that is so good that it never >> >> >>> invents objects such as: >> >> >>> >> >> >>> Object A: "A person who once watched all of Tuesday Welds >> >> >>> movies >> >> >>> in >> >> >>> the space of one week" or >> >> >>> >> >> >>> Object B: "Something that is a combination of Julius Caesar's >> >> >>> pinky >> >> >>> toe and a sour grape that Brutus' just spat out" or >> >> >>> >> >> >>> Object C: "All of the molecules involved in a swiming gala that >> >> >>> happen to be 17.36 meters from the last drop of water that >> >> >>> splashed >> >> >>> from the pool". >> >> >>> >> >> >>> You have supplied no mechanism that is able to do that, but that >> >> >>> mechanism is 90% of the trouble, if learning is what you are >> >> >>> about. >> >> >>> >> >> >>> With prior unsupervised learning you are 100% correct. However >> >> >>> none >> >> >>> of >> >> >>> the examples you gave involved temporal simultaneity. I will >> >> >>> discuss B >> >> >>> above >> >> >>> because it is close enough to be interesting. >> >> >>> If indeed someone just began to notice something interesting about >> >> >>> Caesar's pinkie toe *_as_* they just began to notice the taste of a >> >> >>> sour >> >> >>> grape, then yes, that probably would be leaned via the mechanisms I >> >> >>> am >> >> >>> talking about. However, if one was "present perfect tense" while >> >> >>> the >> >> >>> other >> >> >>> was just beginning, then it wouldn't with my approach but would >> >> >>> with >> >> >>> prior >> >> >>> unsupervised learning methods. For example, Caesar's pinkie toe had >> >> >>> been >> >> >>> noticed and examined, then before the condition passed they tasted >> >> >>> a >> >> >>> sour >> >> >>> grape, then temporal simultaneity of the dp/dt edges wouldn't exist >> >> >>> to >> >> >>> learn >> >> >>> from. Of course, in both cases, the transforms would work >> >> >>> identically >> >> >>> given >> >> >>> identical prior learning/programming. >> >> >> >> >> >> >> >> >> You have not understood the sense in which I made the point, I fear. >> >> > >> >> > >> >> > I think the reverse is true. Consider... >> >> >> >> >> >> I was describing obviously useless concepts. Ones where there is no >> >> >> temporal simultaneity. >> >> > >> >> > >> >> > dp/dt is unable to even notice things that lack temporal >> >> > simultaneity, >> >> > so >> >> > the examples you gave, though typical challenges to past unsupervised >> >> > learning, and complete non-issues in dp/dt space. >> >> >> >> >> >> Concepts thrown together out of completely useless components. >> >> > >> >> > >> >> > ... that require SOME force/reason/bug/error/etc to get thrown >> >> > together. >> >> > I >> >> > think we both understand how this was a typical challenge to past >> >> > unsupervised learning efforts. I am asserting that in dp/dt systems, >> >> > there >> >> > is NO force/reason/error/etc to ever throw such things together, and >> >> > hence, >> >> > no reason for vastly complex matrix transforms to then try to pull >> >> > them >> >> > back >> >> > apart. >> >> >> >> >> >> The question is: how to build a mechanism that does NOT fall into >> >> >> the >> >> >> trap of creating such nonsense-concepts. If you just say "assume >> >> >> that >> >> >> we >> >> >> have such a concept builder" you beg a million questions. >> >> > >> >> > >> >> >> Your reply, above, took one of my examples and tried to talk about >> >> >> what >> >> >> could happen if it was not, after all, a nonsense-concept. >> >> > >> >> > Note that GMAIL got sick here, so I'll mark your text with >. Also, >> >> > some >> >> > replies are deeply indented, so I have bolded some of them. >> >> > >> >> > I was just "playing neuron" without any mindreading abilities. >> >> >>Alas, that is neither here nor there, because (sure enough) >> >> >> *everyone* >> >> >> agrees that temporal simultaneity is a good basic ground for trying >> >> >> to >> >> >> construct new concepts (it is the Reason Number One for creating a >> >> >> new >> >> >> concept!). But we also know that just common or garden variety >> >> >> Temporal >> >> >> Simultaneity doesn't get you very far .... that is the easiest of >> >> >> all >> >> >> mechanisms, and we need a hundred more concept-building mechanisms >> >> >> that >> >> >> are >> >> >> better than that before we have a real concept-generating engine. >> >> > >> >> > Now, we can start "picking through" the approaches. I suspect that >> >> > looking >> >> > for the principal components of temporally simultaneous inputs goes a >> >> > LONG >> >> > was toward what is sought, but have no proof (yet). Do YOU have some >> >> > idea as >> >> > to where the threshold of usefullness is? >> >> > >> >> >> And (here is where my point comes back into the picture) if anyone >> >> >> stands >> >> >> up and says "Hey everyone! I have discovered a hundred concept >> >> >> building >> >> >> mechanisms that I think will do the trick!", the first question that >> >> >> the >> >> >> crowd will ask is: "Do your mechanisms work together to build real, >> >> >> sensible concepts, or do they fill the system with bazillions of >> >> >> really >> >> >> dumb, useless concepts (like my nonsense list above)?" >> >> > >> >> > Clearly, PCA on simultaneous inputs will NOT do that, because they >> >> > must >> >> > show >> >> > common things in order not to end up at the wrong end of the Huffman >> >> > code. >> >> > >> >> >> Anyone who says that they know of a way to get unsupervised learning >> >> >> to >> >> >> occur is saying, implicitly, that they have those 100 concept >> >> >> building >> >> >> mechanisms ready to go (or one super mechanism as good as all of >> >> >> them). >> >> >> Hence my original point: you cannot simply imply that your system >> >> >> is >> >> >> working with bona-fide, coherent concepts unless you can show that >> >> >> it >> >> >> really >> >> >> does come up with concepts (or objects) that are sensible. >> >> > >> >> > Perhaps you could exhibit some examples where learning based on >> >> > temporal >> >> > simultaneity with a preference for identifying common patterns (as >> >> > PCA >> >> > requires) fails. Clearly, if I think that a relatively simple >> >> > approach >> >> > (like >> >> > PCA on dp/dt inputs) should work, but you are convinced that it will >> >> > fall >> >> > into an abyss of superstitious learning, then you will have a MUCH >> >> > easier >> >> > time exhibiting a couple of example failures than I will have somehow >> >> > proving that it always works (which is probably beyond the >> >> > mathematical >> >> > state of the art). >> >> > >> >> > I'm not saying your are wrong here, only that you may not have heard >> >> > me >> >> > (probably my fault for not saying things clearly enough), and you >> >> > haven't >> >> > made your point by exhibiting something on which my approach would >> >> > fail. >> >> > >> >> >> FWIW, I would level the same criticism against quite a few other >> >> >> people, >> >> >> so you don't stand alone here. >> >> > >> >> > My ego is quite indestructable and I understand that your body >> >> > temperature >> >> > is low, so you have nothing to worry about here. >> >> >> (Just briefly: if I move on to look at your actual reply above, I >> >> >> see >> >> >> also mention of rates of change (dp/dt), but no explanation of how >> >> >> rates of >> >> >> change of anything would help a system build a concept that is a >> >> >> combination >> >> >> (NOT an association, please!) of [Julius Caesar's pinky toe and a >> >> >> sour >> >> >> grape >> >> >> that Brutus' just spat out]. The rates of change seem irrelevant >> >> >> here). >> >> > >> >> > If you take a neuron or Bayesian formula programmed to do something >> >> > static >> >> > and throw dp/dt inputs at it, its output will be the dp/dt of the >> >> > result >> >> > from static operation. You could then simply integrate it to produce >> >> > exactly >> >> > the same output. Hence, the ONLY reason to operate in dp/dt space is >> >> > for >> >> > the >> >> > learning, as the transformation itself is unaffected. >> >> > >> >> > Now, if you look for an association in dp/dt space and decide to >> >> > recognize >> >> > it, that same neuron with then operate to recognize a combination, >> >> > once >> >> > its >> >> > output has been integrated. Of course, not integrating but simply >> >> > using >> >> > its >> >> > output by subsequent neurons, the entire system will operate as >> >> > though >> >> > it >> >> > recognized the combination, even though, if you attached an >> >> > oscilloscope >> >> > to >> >> > the output, you would see positive and negative spikes around what >> >> > would >> >> > be >> >> > a steady-state output in "object" mode. >> >> > >> >> > In short, it programs based on associations, but functions based on >> >> > represented combinations, that representation being the dp/dt of the >> >> > combination. >> >> > >> >> >>> >> >> >>> >> >> >>> Instead, you waved your hands and said "fast unsupervised >> >> >>> learning >> >> >>> > then becomes completely trivial" .... this statement is a >> >> >>> declaration that a good mechanism is available. >> >> >>> >> >> >>> You then also talk about "like" objects. But the whole concept >> >> >>> of >> >> >>> "like" is extraordinarily troublesome. Are Julius Caesar and >> >> >>> Brutus >> >> >>> "like" each other? Seen from our distance, maybe yes, but from >> >> >>> the >> >> >>> point of view of Julius C., probably not so much. Is a G-type >> >> >>> star >> >> >>> "like" a mirror? I don't know any stellar astrophysicists who >> >> >>> would >> >> >>> say so, but then again OF COURSE they are, because they are >> >> >>> almost >> >> >>> indistinguishable, because if you hold a mirror up in the right >> >> >>> way >> >> >>> it can reflect the sun and the two visual images can be >> >> >>> identical. >> >> >>> >> >> >>> These questions can be resolved, sure enough, but it is the >> >> >>> whole >> >> >>> business of resolving these questions (rather than waving a hand >> >> >>> over them and declaring them to be trivial) that is the point. >> >> >>> >> >> >>> I think that pretty much everyone everyone who has "dented their >> >> >>> pick" >> >> >>> on unsupervised learning (this includes myself. Does anyone else >> >> >>> here >> >> >>> have >> >> >>> these same scars?) has developed methods that would work on >> >> >>> "completely >> >> >>> obvious" test cases but failed miserably on real-world input. My >> >> >>> point >> >> >>> here >> >> >>> is that looking at things from a dp/dt point of view, real-world >> >> >>> situations >> >> >>> become about as simple as "completely obvious" test cases. >> >> >>> I would quote some good source to make this point, but I don't >> >> >>> think >> >> >>> anyone has gone here yet. >> >> >> >> >> >>> >> >> >>> >> >> >>> If you don't have a clear demonstration that this dp/dt idea does >> >> >>> deliver >> >> >>> the goods, why are you claiming that it does? Surely it is one or >> >> >>> the >> >> >>> other? >> >> >>> >> >> >>> This month I am wearing my mathematician hat. My son Eddie is the >> >> >>> NN >> >> >>> hacker of the family, and he is waiting impatiently for me to >> >> >>> declare >> >> >>> a >> >> >>> tentative completion so he can run with it. >> >> >>> >> >> >>> For now, my goal is to come up with sufficiently good theory that >> >> >>> even >> >> >>> you can't poke any significant holes in it. Once I become the first >> >> >>> person >> >> >>> in history to ever receive the the Loosemore Seal of No Objection, >> >> >>> I >> >> >>> will >> >> >>> probably wrap this thing up and turn it over to Eddie. >> >> >>>> >> >> >>>> But Steve, if YOU claim that "looking at things from a dp/dt point >> >> >>>> of >> >> >>>> view" does in fact yield a dramatic breakthrough that allows >> >> >>>> unsupervised >> >> >>>> learning to work on real world cases (something nobody else can do >> >> >>>> right >> >> >>>> now), >> >> >>>> >> >> >>>> Not entirely true, as PCA does what could be considered to be >> >> >>>> unsupervised learning, though granted, it is WAY too inefficient >> >> >>>> for >> >> >>>> NN/AGI >> >> >>>> use without dp/dt. >> >> >>>>> >> >> >>>>> then YOU are expected to be the one who has gone there, done it, >> >> >>>>> and >> >> >>>>> come back with evidence that your idea does in fact do that. >> >> >> >> >> >> First comes the theory, then comes the demo. Neither contains any >> >> >> sort >> >> >> of >> >> >> proof, but it is a LOT cheaper to shoot something down BEFORE it is >> >> >> built >> >> >> than after. Hence, I find this exercise VERY valuable. THANKS. >> >> >> Please >> >> >> keep >> >> >> up the good work. >> >> >> >> >> >> Steve Richield >> >> > >> >> > ________________________________ >> >> > agi | Archives | Modify Your Subscription >> >> >> >> >> >> >> >> -- >> >> Abram Demski >> >> Public address: [email protected] >> >> Public archive: http://groups.google.com/group/abram-demski >> >> Private address: [email protected] >> >> >> >> >> >> ------------------------------------------- >> >> agi >> >> Archives: https://www.listbox.com/member/archive/303/=now >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> >> Modify Your Subscription: https://www.listbox.com/member/?& >> >> Powered by Listbox: http://www.listbox.com >> > >> > ________________________________ >> > agi | Archives | Modify Your Subscription >> >> >> >> -- >> Abram Demski >> Public address: [email protected] >> Public archive: http://groups.google.com/group/abram-demski >> Private address: [email protected] >> >> >> ------------------------------------------- >> agi >> Archives: https://www.listbox.com/member/archive/303/=now >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> Modify Your Subscription: https://www.listbox.com/member/?& >> Powered by Listbox: http://www.listbox.com > > ________________________________ > agi | Archives | Modify Your Subscription -- Abram Demski Public address: [email protected] Public archive: http://groups.google.com/group/abram-demski Private address: [email protected] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
