Steve, My thinking in the "significant figures" issue is that the purpose of unsupervised learning is to find a probabilistic model of the data (whereas the purpose of supervised learning is to find a probabilistic model of *one* variable *conditioned on* all the others). When you talk about the insufficiency of standard PCA, do you think the problems you refer to relate to
(1) PCA finding a suboptimal model, or (2) the optimal model being not quite what you are after? --Abram On Sat, Dec 27, 2008 at 3:05 AM, Steve Richfield <[email protected]> wrote: > Abram, > > On 12/26/08, Abram Demski <[email protected]> wrote: >> >> Steve, >> >> When I made the statement about Fourier I was thinking of JPEG >> encoding. A little digging found this book, which presents a unified >> approach to (low-level) computer vision based on the Fourier >> transform: >> >> >> http://books.google.com/books?id=1wJuTMbNT0MC&dq=fourier+vision&printsec=frontcover&source=bl&ots=3ogSJ2i5uW&sig=ZdvvWvu82q8UX1c5Abq6hWvgZCY&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPA4,M > > > Interesting, but seems far removed from wet neuronal functionality, > unsupervised learning, etc. >> >> >> But that is beside the present point. :) >> > >> > >> > Probably so. I noticed that you recently graduated, so I thought that I >> > would drop that thought to make (or unmake) your day. >> >> :) I should really update that. It's been a while now. >> >> >> generally, any transform that makes the data more sparse, or simpler, >> >> seems good >> > >> > >> > Certainly if it results in extracting some useful of merit. >> > >> >> >> >> -- which is of course what PCA does, >> > >> > >> > Sometimes yes, and sometimes no. I am looking at incremental PCA >> > approaches >> > that reliably extract separate figures of merit rather than >> > smushed-together >> > figures of merit as PCA often does. >> >> How do you define "figures of merit"? Sounds like an ill-defined >> problem to me. We don't know which features we *really* want to >> extract from an image until we know the utility function of the >> environment, and so know what information will help us achieve our >> goals. > > > There are several views of this, e.g. > 1. Pick something to recognize and see if back-propagation says that it is > useful. > In practice this has problems, because once a downstream neuron makes > any tentative use, > then changing an upstream neuron's functionality scrambles the > downstream neuron's output. > 2. Pick one of the most consistent, easily-recognizable, most > information-containing things to recognize ala PCA, > and expect downstream neurons to combine inputs to extract whatever > they need through Bayesian logic. > This may suffer from too many prospective things to recognize, most of > which are not needed. > > These two methods look like they could fix each other's shortcomings, > because good initial choices > of figures of merit should then result in neurons either keeping their > functionality or abandoning > it, and thereby avoid the problems of changing functionality scrambling > downstream neurons. > This way, back-propagation could be used to select which upstream neurons > need to find something > else to do, but would have little/no impact on incremental "learning" as > reward/punishment systems now do. > > My BIG challenge is that without something like dp/dt, unsupervised learning > doesn't work. > Now, with dp/dt it is a whole new game, and I have no idea where the > threshold of real-world functionality lies. > Hence, I seem to be forced into "pessimization"- making things as good as > possible, > even though I may be well past that threshold. > > Eddie's NN platform is able to tie into other applications, like flight > simulator, web cams, etc. > Hence, there is the whole Internet full of cameras to learn with, and it > might be interesting to see if > such a NN would be able to figure out how to fly a plane, maybe like > Skinner's pidgeons . > > Thanks for your continuing thoughts. > > Steve Richfield > ==================== >> >> On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield >> <[email protected]> wrote: >> > Abram, >> > >> > On 12/26/08, Abram Demski <[email protected]> wrote: >> >> >> >> Steve, >> >> >> >> It is strange to claim that prior PhDs will be worthless when what you >> >> are suggesting is that we apply the standard methods to a different >> >> representation. >> > >> > >> > Much of AI and pretty much all of AGI is built on the proposition that >> > we >> > humans must code knowledge because the stupid machines can't efficiently >> > learn it on their own, in short, that UNsupervised learning is >> > difficult. >> > Note that in nature, UNsupervised learning handily outperforms >> > supervised >> > learning. What good is supervised NN technology when UNsupervised NNs >> > will >> > perform MUCH better? What good are a few hand-coded AGI rules and the >> > engine >> > that runs them, when an UNsupervised AGI can learn them orders of >> > magnitude >> > faster than cities full of programmers? Note my prior post where I >> > explain >> > that either AGIs must either abandon UNsuperised learning, or switch to >> > a >> > NN-like implementation. In short, easy UNsupervised learning will change >> > things about as much as the switch from horse and buggy to automobiles, >> > leaving present PhDs in the position of blacksmiths and historians. Sure >> > blacksmiths had transferrable skills, but they weren't worth much and >> > they >> > weren't respected at all. >> > >> > In the 1980s, countless top computer people (including myself) had to >> > expunge all references to mainframe computers from our resumes in order >> > to >> > find work in a microcomputer-dominated field. I expect to see rounds of >> > the >> > same sort of insanity when UNsupervised learning emerges. >> > >> >> >> >> But that is beside the present point. :) >> > >> > >> > Probably so. I noticed that you recently graduated, so I thought that I >> > would drop that thought to make (or unmake) your day. >> > >> >> >> >> Taking the derivative, or just finite differences, is a useful step in >> >> more ways then one. You are talking about taking differences over >> >> time, but differences over space can used for edge detection, >> >> frequently thought of as the first step in visual processing. >> > >> > >> > Correct. My paper goes into using any dimension that is differentiable. >> > Note >> > that continuous eye movement converts a physical dimension to time >> > domain. >> > >> >> >> >> More >> >> generally, any transform that makes the data more sparse, or simpler, >> >> seems good >> > >> > >> > Certainly if it results in extracting some useful of merit. >> > >> >> >> >> -- which is of course what PCA does, >> > >> > >> > Sometimes yes, and sometimes no. I am looking at incremental PCA >> > approaches >> > that reliably extract separate figures of merit rather than >> > smushed-together >> > figures of merit as PCA often does. Another problem with classical PCA >> > is >> > that it can't provide real-time learning, but instead, works via a sort >> > of >> > "batch processing" of statistics collected in the array that is being >> > transformed. >> > >> >> >> >> and derivatives in >> >> time/space, and also the fourier transform I think. The usefulness of >> >> these transforms springs from underlying regularities in the data. >> > >> > >> > Hmmm, I don't see where a Fourier transform would enter the cognitive >> > process. Perhaps you see something that I have missed? >> > >> >> >> >> That's not to say that I don't think some representations are >> >> fundamentally more useful than others-- for example, I know that some >> >> proofs are astronomically larger in 1st-order logic as compared to >> >> 2nd-order logic, even in domains where 1st-order logic is >> >> representationally sufficient. >> >> >> >> The statement about time correction reminds me of a system called >> >> PURR-PUSS. >> > >> > >> > >> > However, as I understand it, the Purposeful Unprimed Real-world Robot >> > with >> > Predictors Using Short Segments still relied on rewards and punishments >> > for >> > learning. >> >> >> >> It is turing-complete in some sense, essentially by >> >> compounding time-delays, but I do not know exactly what sense (ie, a >> >> turing complete *learner* is very different then a turing-complete >> >> *programmable computer*... PURR PUSS uses something inbetween called >> >> "soft teaching" if I recall correctly.) >> > >> > >> > The old DEC LINC and LINC-8 computers operated the instruction >> > sequencing >> > with a pile of time delay modules, and someone had to go in and >> > recalibrate >> > every few months. >> > >> > Steve Richfield >> > ================== >> > >> >> >> >> On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield >> >> <[email protected]> wrote: >> >> > Abram, >> >> > >> >> > On 12/26/08, Abram Demski <[email protected]> wrote: >> >> >> >> >> >> Steve, >> >> >> >> >> >> Richard is right when he says temporal simultaneity is not a >> >> >> sufficient principle. >> >> > >> >> > >> >> > ... and I fully agree. However, we must unfold this thing one piece >> >> > at a >> >> > time. >> >> > >> >> > Without the dp/dt "trick", there doesn't seem to be any way to make >> >> > unsupervised learning work, and I appear to be the first to stumble >> >> > onto >> >> > dp/dt. This is a whole new and unexplored world, where the things >> >> > that >> >> > stymied past unsupervised efforts fall out effortlessly, but some new >> >> > challenges present themselves. >> >> > >> >> >> >> >> >> Suppose you present your system with the >> >> >> following sequences (letters could be substituted for sounds, >> >> >> colors, >> >> >> objects, whatever): >> >> >> >> >> >> ABCABCABCABC... >> >> >> >> >> >> AAABBBAAABBB... >> >> >> >> >> >> ABBAAABBBBAAAAABBBBBB... >> >> >> >> >> >> ABBCCCDDDDEEEEEFFFFFF... >> >> >> >> >> >> ABACABADABACABAEABACABADABACABA... >> >> >> >> >> >> All of these sequences have "concepts" behind them. All of these >> >> >> concepts are immune to temporal-simultaneity-learning (although the >> >> >> first could be learned by temporal adjacency, and the second by >> >> >> temporal adjacency with a delay of 3). >> >> > >> >> > >> >> > The way that wet neurons are built, this is unavoidable! Here is >> >> > another >> >> > snippet from my paper... >> >> > >> >> > >> >> > Time Correction >> >> > >> >> > Electronics designers routinely use differentiation and integration >> >> > to >> >> > advance and retard timing. Phase-linear low-pass filters are often >> >> > used >> >> > to >> >> > make short delays in a signal, and "peaking" capacitors were used in >> >> > RTL >> >> > (Resistor Transistor Logic) to differentiate inputs for quicker >> >> > output. >> >> > Further, wet neurons introduce their own propagation delays from >> >> > input >> >> > synapse to output synapse. If not somehow corrected, the net effect >> >> > of >> >> > this >> >> > is a scrambling of the time that a given signal/node/term represents, >> >> > which >> >> > if left uncorrected, would result in relating signals together that >> >> > are >> >> > arbitrarily shifted in time. There seems to be three schools of >> >> > thought >> >> > regarding this: >> >> > >> >> > No problem. This simply results in considering various things shifted >> >> > arbitrarily in time. When wet neurons learn what works, this will >> >> > result >> >> > in >> >> > recognizing time-sequenced phenomena. Arbitrary delays might also do >> >> > a >> >> > lot >> >> > for artificial neurons. >> >> > >> >> > Time correction could be instituted, e.g. through Taylor series >> >> > signal >> >> > extrapolation to in effect remove a neuron's delay, at the cost of >> >> > introducing considerable noise into the result. My own simulations of >> >> > Taylor >> >> > series extrapolation functions showed that the first derivative may >> >> > indeed >> >> > help for small corrections, but beyond that, subtle changes in the >> >> > shape >> >> > of >> >> > a transition cause wild changes in the extrapolated result, sometimes >> >> > going >> >> > so far as to produce short bursts of oscillation. Downstream neurons >> >> > may >> >> > then amplify these problems to produce havoc at the output of the >> >> > artificial >> >> > neural network. >> >> > >> >> > The method utilized in CRAY computers might be in use, where all >> >> > delays >> >> > were >> >> > a precise multiple (of their clock rate) long. This was achieved by >> >> > using >> >> > interconnecting wires cut to certain specific lengths, even though >> >> > the >> >> > length may be much longer than actually physically needed to >> >> > interconnect >> >> > two components. Perhaps wet neurons only come in certain very >> >> > specific >> >> > delays. There is some laboratory evidence for this, as each section >> >> > of >> >> > our >> >> > brains has neurons with similar geometry within the group. This has >> >> > been >> >> > presumed to be an artifact of evolution and limited DNA space, but >> >> > may >> >> > in >> >> > fact be necessary for proper time correction. >> >> > >> >> > No one now knows which of these are in use in wet neurons. However, >> >> > regardless of wet-neuron functionality, artificial neural network >> >> > researchers should be attentive to time correction >> >> > >> >> > Note that #1 above unavoidably solves the time-sequencing puzzle. >> >> > Introduce >> >> > some integration, and the sequencing can be arbitrarily shifted in >> >> > time >> >> > - >> >> > within reasonable limits (seconds, maybe a minute or two). >> >> > >> >> >> >> >> >> The transition to sequence learning is (at least, in my eyes) a >> >> >> transition to relational learning, as opposed to the "flat" learning >> >> >> that PCA is designed for. >> >> > >> >> > >> >> > I suspect that PCA-like methods are at work within neurons, and that >> >> > sequence learning and the like fall out from inter-neuronal >> >> > connections >> >> > and >> >> > the associated delays, integration, etc. >> >> >> >> >> >> In other words, completely new methods are >> >> >> required. You already begin that transition by invoking dp/dt, which >> >> >> assumes a temporal aspect to the data... >> >> >> >> >> >> See this blog post for a more full account of my view on the current >> >> >> state of affairs. (It started out as a post about a new algorithm >> >> >> I'd >> >> >> been thinking about, but turned into an essay on the difference >> >> >> between relational methods and "flat" (propositional) methods, and >> >> >> how >> >> >> to bridge the gap. If you're wondering about the title, see the >> >> >> previous post.) >> >> >> >> >> >> >> >> >> >> >> >> http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html >> >> > >> >> > >> >> > This blog and this email reflect a common problem with AI-thought. >> >> > There >> >> > are >> >> > LOTS of things that people are VERY bad at doing, and these generally >> >> > make >> >> > at the same time horrible examples to test human cognition theories >> >> > on, >> >> > and >> >> > wonderful potential AI applications. >> >> > >> >> > A perfect example is health and disease, where the human cognition >> >> > process >> >> > tends to run in unproductive directions. Any given set of symptoms >> >> > typically >> >> > has ~12 different common underlying causal mechanisms, each of which >> >> > has >> >> > several cause-and-effect chain links that are typically arranged in a >> >> > figure >> >> > "6" configuration with a self-sustaining loop at the end. Given >> >> > limitless >> >> > understanding, it typically takes two seemingly unrelated actions to >> >> > actually cure anything, one to stop the lead-in, and the other to >> >> > momentarily interrupt the self-sustaining loop. >> >> > >> >> > It is my present suspicion that unsupervised learning is SO simple >> >> > that >> >> > it >> >> > just falls out of a system using the right representation. Even the >> >> > simplest >> >> > of creatures do quite well at it. However, without that >> >> > representation, >> >> > it >> >> > is horrifically hard/impossible. This means that NN and AGI guys >> >> > should >> >> > all >> >> > STOP whatever they are doing and find the right representation, which >> >> > is >> >> > the >> >> > path that I have gone on. >> >> > >> >> > Note that if I am successful, that prior PhDs in AI/NN won't be worth >> >> > spit >> >> > because they will be built on false premises. Good for history, but >> >> > bad >> >> > for >> >> > science. >> >> > >> >> > Thanks for your thoughts. Any more? >> >> > >> >> > Steve Richfield >> >> > ======================== >> >> >> >> >> >> On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield >> >> >> <[email protected]> wrote: >> >> >> > Richard,Richard, >> >> >> > >> >> >> > On 12/25/08, Richard Loosemore <[email protected]> wrote: >> >> >> >> >> >> >> >> Steve Richfield wrote: >> >> >> >>> >> >> >> >>> There are doubtless exceptions to my broad statement, but >> >> >> >>> generally, >> >> >> >>> neuron functionality is WIDE open to be pretty much ANYTHING you >> >> >> >>> choose, >> >> >> >>> including that of an AGI engine's functionality on its >> >> >> >>> equations. >> >> >> >>> In the reverse, any NN could be expressed in a shorthand form >> >> >> >>> that >> >> >> >>> contains structure, synapse functions, etc., and an AGI engine >> >> >> >>> could >> >> >> >>> be >> >> >> >>> built/modified to function according to that shorthand. >> >> >> >>> In short, mapping between NN and AGI forms presumes flexibility >> >> >> >>> in >> >> >> >>> the >> >> >> >>> functionality of the target form. Where that flexibility is NOT >> >> >> >>> present, >> >> >> >>> e.g. because of orthogonal structure, etc., then you must ask >> >> >> >>> whether >> >> >> >>> something is being gained or lost by the difference. Clearly, >> >> >> >>> any >> >> >> >>> transition >> >> >> >>> that involves a loss should be carefully examined to see if the >> >> >> >>> entire >> >> >> >>> effort is headed in the wrong direction, which I think was your >> >> >> >>> original >> >> >> >>> point here. >> >> >> >> >> >> >> >> >> >> >> >> There is a problem here. >> >> >> >> >> >> >> >> When someone says "X and Y can easily be mapped from one form to >> >> >> >> the >> >> >> >> other" there is an implication that they are NOt suggesting that >> >> >> >> we >> >> >> >> go >> >> >> >> right >> >> >> >> down to the basic constituents of both X and Y in order to effect >> >> >> >> the >> >> >> >> mapping. >> >> >> >> >> >> >> >> Thus: "Chalk and Cheese can easily be mapped from one to the >> >> >> >> other" >> >> >> >> .... >> >> >> >> trivially true if we are prepared to go down to the common >> >> >> >> denominator >> >> >> >> of >> >> >> >> electrons, protons and neutrons. But if we stay at a sensible >> >> >> >> level >> >> >> >> then, >> >> >> >> no, these do not map onto one another. >> >> >> > >> >> >> > >> >> >> > The problem here is that you were thinking present existing NN and >> >> >> > AGI >> >> >> > systems, neither of which work (yet) in any really useful way, >> >> >> > that >> >> >> > it >> >> >> > was >> >> >> > obviously impossible to directly convert from one system with its >> >> >> > set >> >> >> > of >> >> >> > bad >> >> >> > assumptions to another system with a completely different set of >> >> >> > bad >> >> >> > assumptions. I completely agree, but I assert that the answer to >> >> >> > that >> >> >> > particular question is of no practical interest to anyone. >> >> >> > >> >> >> > On the other hand, converting between NN and AGI systems built on >> >> >> > the >> >> >> > SAME >> >> >> > set of assumptions would be simple. This situation doesn't yet >> >> >> > exist. >> >> >> > Until >> >> >> > then, converting a program from one dysfunctional platform to >> >> >> > another >> >> >> > is >> >> >> > uninteresting. When the assumptions get ironed out, then all >> >> >> > systems >> >> >> > will be >> >> >> > built on the same assumptions, and there will be few problems >> >> >> > going >> >> >> > between >> >> >> > them, EXCEPT: >> >> >> > >> >> >> > Things need to be arranged in arrays for automated learning, which >> >> >> > much >> >> >> > more >> >> >> > fits the present NN paradigm than the present AGI paradigm. >> >> >> >> >> >> >> >> Similarly, if you claim that NN and regular AGI map onto one >> >> >> >> another, I >> >> >> >> assume that you are saying something more substantial than that >> >> >> >> these >> >> >> >> two >> >> >> >> can both be broken down into their primitive computational parts, >> >> >> >> and >> >> >> >> that >> >> >> >> when this is done they seem equivalent. >> >> >> > >> >> >> > >> >> >> > Even this breakdown isn't required if both systems are built on >> >> >> > the >> >> >> > same >> >> >> > correct assumptions. HOWEVER, I see no way to transfer fast >> >> >> > learning >> >> >> > from an >> >> >> > NN-like construction to an AGI-like construction. Do you? If there >> >> >> > is >> >> >> > no >> >> >> > answer to this question, then this unanswerable question would >> >> >> > seem >> >> >> > to >> >> >> > redirect AGI efforts to NN-like constructions if they are ever to >> >> >> > learn >> >> >> > like >> >> >> > we do. >> >> >> >> >> >> >> >> NN and regular AGI, they way they are understood by people who >> >> >> >> understand >> >> >> >> them, have very different styles of constructing intelligent >> >> >> >> systems. >> >> >> > >> >> >> > >> >> >> > Neither of which work (yet). Of course, we are both trying to fill >> >> >> > in >> >> >> > the >> >> >> > gaps. >> >> >> >> >> >> >> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to >> >> >> >> trash >> >> >> >> the >> >> >> >> real meaning of "are easily mapped onto one another". >> >> >> > >> >> >> > >> >> >> > One of my favorite consulting projects involved coding an AI >> >> >> > program >> >> >> > to >> >> >> > solve complex problems that were roughly equivalent to solving >> >> >> > algebraic >> >> >> > equations. This composed the Yellow pages for 28 different large >> >> >> > phone >> >> >> > directories. The project was for a major phone company and had to >> >> >> > be >> >> >> > written >> >> >> > entirely in COBOL. Further, it had to run at n log n speed and NOT >> >> >> > n^2 >> >> >> > speed, which I did by using successive sorts instead of list >> >> >> > processing >> >> >> > methods. It would have been rather difficult to achieve the needed >> >> >> > performance in C or Lisp, even though COBOL would seem to be >> >> >> > everyone's >> >> >> > first choice as the last choice on the list of prospective >> >> >> > platforms. >> >> >> >>> >> >> >> >>> ), instead of operating on "objects" (in an >> >> >> >>> >> >> >> >>> object-oriented sense) >> >> >> >>> >> >> >> >>> >> >> >> >>> Neither NN nor AGI has any intrinsic relationship to OO. >> >> >> >>> >> >> >> >>> Clearly I need a better term here. Both NNs and AGIs tend to >> >> >> >>> have >> >> >> >>> neurons or equations that reflect the presence (or absence) of >> >> >> >>> various >> >> >> >>> objects, conditions, actions, etc. My fundamental assertion is >> >> >> >>> that >> >> >> >>> if >> >> >> >>> you >> >> >> >>> differentiate the inputs so that everything in the entire >> >> >> >>> network >> >> >> >>> reflects >> >> >> >>> dp/dt instead of straight probabilities, then the network works >> >> >> >>> identically, >> >> >> >>> but learning is GREATLY simplified. >> >> >> >> >> >> >> >> Seems like a simple misunderstanding: you were not aware that >> >> >> >> "object >> >> >> >> oriented" does not mean the same as saying that there are >> >> >> >> fundamental >> >> >> >> atomic >> >> >> >> constituents of a representation. >> >> >> > >> >> >> > >> >> >> > A typical semantic overloading problem. "Atomic consitituent >> >> >> > orientation" >> >> >> > doesn't really work either, because in later stages, individual >> >> >> > terms/neurons can represent entire concepts, strategies, etc. I am >> >> >> > still >> >> >> > looking for a good term here. >> >> >> >>> >> >> >> >>> >> >> >> >>> , instead, operates on the rate-of-changes in the >> >> >> >>> >> >> >> >>> probabilities of "objects", or dp/dt. Presuming >> >> >> >>> sufficient >> >> >> >>> bandwidth to generally avoid superstitious coincidences, >> >> >> >>> fast >> >> >> >>> unsupervised learning then becomes completely trivial, as >> >> >> >>> like >> >> >> >>> objects cause simultaneous like-patterned changes in the >> >> >> >>> inputs >> >> >> >>> WITHOUT the overlapping effects of the many other objects >> >> >> >>> typically present in the input (with numerous minor >> >> >> >>> exceptions). >> >> >> >>> >> >> >> >>> >> >> >> >>> You have already presumed that something supplies the system >> >> >> >>> with >> >> >> >>> "objects" that are meaningful. Even before your first >> >> >> >>> mention >> >> >> >>> of >> >> >> >>> dp/dt, there has to be a mechanism that is so good that it >> >> >> >>> never >> >> >> >>> invents objects such as: >> >> >> >>> >> >> >> >>> Object A: "A person who once watched all of Tuesday Welds >> >> >> >>> movies >> >> >> >>> in >> >> >> >>> the space of one week" or >> >> >> >>> >> >> >> >>> Object B: "Something that is a combination of Julius >> >> >> >>> Caesar's >> >> >> >>> pinky >> >> >> >>> toe and a sour grape that Brutus' just spat out" or >> >> >> >>> >> >> >> >>> Object C: "All of the molecules involved in a swiming gala >> >> >> >>> that >> >> >> >>> happen to be 17.36 meters from the last drop of water that >> >> >> >>> splashed >> >> >> >>> from the pool". >> >> >> >>> >> >> >> >>> You have supplied no mechanism that is able to do that, but >> >> >> >>> that >> >> >> >>> mechanism is 90% of the trouble, if learning is what you are >> >> >> >>> about. >> >> >> >>> >> >> >> >>> With prior unsupervised learning you are 100% correct. However >> >> >> >>> none >> >> >> >>> of >> >> >> >>> the examples you gave involved temporal simultaneity. I will >> >> >> >>> discuss B >> >> >> >>> above >> >> >> >>> because it is close enough to be interesting. >> >> >> >>> If indeed someone just began to notice something interesting >> >> >> >>> about >> >> >> >>> Caesar's pinkie toe *_as_* they just began to notice the taste >> >> >> >>> of a >> >> >> >>> sour >> >> >> >>> grape, then yes, that probably would be leaned via the >> >> >> >>> mechanisms I >> >> >> >>> am >> >> >> >>> talking about. However, if one was "present perfect tense" while >> >> >> >>> the >> >> >> >>> other >> >> >> >>> was just beginning, then it wouldn't with my approach but would >> >> >> >>> with >> >> >> >>> prior >> >> >> >>> unsupervised learning methods. For example, Caesar's pinkie toe >> >> >> >>> had >> >> >> >>> been >> >> >> >>> noticed and examined, then before the condition passed they >> >> >> >>> tasted >> >> >> >>> a >> >> >> >>> sour >> >> >> >>> grape, then temporal simultaneity of the dp/dt edges wouldn't >> >> >> >>> exist >> >> >> >>> to >> >> >> >>> learn >> >> >> >>> from. Of course, in both cases, the transforms would work >> >> >> >>> identically >> >> >> >>> given >> >> >> >>> identical prior learning/programming. >> >> >> >> >> >> >> >> >> >> >> >> You have not understood the sense in which I made the point, I >> >> >> >> fear. >> >> >> > >> >> >> > >> >> >> > I think the reverse is true. Consider... >> >> >> >> >> >> >> >> I was describing obviously useless concepts. Ones where there is >> >> >> >> no >> >> >> >> temporal simultaneity. >> >> >> > >> >> >> > >> >> >> > dp/dt is unable to even notice things that lack temporal >> >> >> > simultaneity, >> >> >> > so >> >> >> > the examples you gave, though typical challenges to past >> >> >> > unsupervised >> >> >> > learning, and complete non-issues in dp/dt space. >> >> >> >> >> >> >> >> Concepts thrown together out of completely useless components. >> >> >> > >> >> >> > >> >> >> > ... that require SOME force/reason/bug/error/etc to get thrown >> >> >> > together. >> >> >> > I >> >> >> > think we both understand how this was a typical challenge to past >> >> >> > unsupervised learning efforts. I am asserting that in dp/dt >> >> >> > systems, >> >> >> > there >> >> >> > is NO force/reason/error/etc to ever throw such things together, >> >> >> > and >> >> >> > hence, >> >> >> > no reason for vastly complex matrix transforms to then try to pull >> >> >> > them >> >> >> > back >> >> >> > apart. >> >> >> >> >> >> >> >> The question is: how to build a mechanism that does NOT fall >> >> >> >> into >> >> >> >> the >> >> >> >> trap of creating such nonsense-concepts. If you just say "assume >> >> >> >> that >> >> >> >> we >> >> >> >> have such a concept builder" you beg a million questions. >> >> >> > >> >> >> > >> >> >> >> Your reply, above, took one of my examples and tried to talk >> >> >> >> about >> >> >> >> what >> >> >> >> could happen if it was not, after all, a nonsense-concept. >> >> >> > >> >> >> > Note that GMAIL got sick here, so I'll mark your text with >. >> >> >> > Also, >> >> >> > some >> >> >> > replies are deeply indented, so I have bolded some of them. >> >> >> > >> >> >> > I was just "playing neuron" without any mindreading abilities. >> >> >> >>Alas, that is neither here nor there, because (sure enough) >> >> >> >> *everyone* >> >> >> >> agrees that temporal simultaneity is a good basic ground for >> >> >> >> trying >> >> >> >> to >> >> >> >> construct new concepts (it is the Reason Number One for creating >> >> >> >> a >> >> >> >> new >> >> >> >> concept!). But we also know that just common or garden variety >> >> >> >> Temporal >> >> >> >> Simultaneity doesn't get you very far .... that is the easiest of >> >> >> >> all >> >> >> >> mechanisms, and we need a hundred more concept-building >> >> >> >> mechanisms >> >> >> >> that >> >> >> >> are >> >> >> >> better than that before we have a real concept-generating engine. >> >> >> > >> >> >> > Now, we can start "picking through" the approaches. I suspect that >> >> >> > looking >> >> >> > for the principal components of temporally simultaneous inputs >> >> >> > goes a >> >> >> > LONG >> >> >> > was toward what is sought, but have no proof (yet). Do YOU have >> >> >> > some >> >> >> > idea as >> >> >> > to where the threshold of usefullness is? >> >> >> > >> >> >> >> And (here is where my point comes back into the picture) if >> >> >> >> anyone >> >> >> >> stands >> >> >> >> up and says "Hey everyone! I have discovered a hundred concept >> >> >> >> building >> >> >> >> mechanisms that I think will do the trick!", the first question >> >> >> >> that >> >> >> >> the >> >> >> >> crowd will ask is: "Do your mechanisms work together to build >> >> >> >> real, >> >> >> >> sensible concepts, or do they fill the system with bazillions of >> >> >> >> really >> >> >> >> dumb, useless concepts (like my nonsense list above)?" >> >> >> > >> >> >> > Clearly, PCA on simultaneous inputs will NOT do that, because they >> >> >> > must >> >> >> > show >> >> >> > common things in order not to end up at the wrong end of the >> >> >> > Huffman >> >> >> > code. >> >> >> > >> >> >> >> Anyone who says that they know of a way to get unsupervised >> >> >> >> learning >> >> >> >> to >> >> >> >> occur is saying, implicitly, that they have those 100 concept >> >> >> >> building >> >> >> >> mechanisms ready to go (or one super mechanism as good as all of >> >> >> >> them). >> >> >> >> Hence my original point: you cannot simply imply that your >> >> >> >> system >> >> >> >> is >> >> >> >> working with bona-fide, coherent concepts unless you can show >> >> >> >> that >> >> >> >> it >> >> >> >> really >> >> >> >> does come up with concepts (or objects) that are sensible. >> >> >> > >> >> >> > Perhaps you could exhibit some examples where learning based on >> >> >> > temporal >> >> >> > simultaneity with a preference for identifying common patterns (as >> >> >> > PCA >> >> >> > requires) fails. Clearly, if I think that a relatively simple >> >> >> > approach >> >> >> > (like >> >> >> > PCA on dp/dt inputs) should work, but you are convinced that it >> >> >> > will >> >> >> > fall >> >> >> > into an abyss of superstitious learning, then you will have a MUCH >> >> >> > easier >> >> >> > time exhibiting a couple of example failures than I will have >> >> >> > somehow >> >> >> > proving that it always works (which is probably beyond the >> >> >> > mathematical >> >> >> > state of the art). >> >> >> > >> >> >> > I'm not saying your are wrong here, only that you may not have >> >> >> > heard >> >> >> > me >> >> >> > (probably my fault for not saying things clearly enough), and you >> >> >> > haven't >> >> >> > made your point by exhibiting something on which my approach would >> >> >> > fail. >> >> >> > >> >> >> >> FWIW, I would level the same criticism against quite a few other >> >> >> >> people, >> >> >> >> so you don't stand alone here. >> >> >> > >> >> >> > My ego is quite indestructable and I understand that your body >> >> >> > temperature >> >> >> > is low, so you have nothing to worry about here. >> >> >> >> (Just briefly: if I move on to look at your actual reply above, >> >> >> >> I >> >> >> >> see >> >> >> >> also mention of rates of change (dp/dt), but no explanation of >> >> >> >> how >> >> >> >> rates of >> >> >> >> change of anything would help a system build a concept that is a >> >> >> >> combination >> >> >> >> (NOT an association, please!) of [Julius Caesar's pinky toe and a >> >> >> >> sour >> >> >> >> grape >> >> >> >> that Brutus' just spat out]. The rates of change seem irrelevant >> >> >> >> here). >> >> >> > >> >> >> > If you take a neuron or Bayesian formula programmed to do >> >> >> > something >> >> >> > static >> >> >> > and throw dp/dt inputs at it, its output will be the dp/dt of the >> >> >> > result >> >> >> > from static operation. You could then simply integrate it to >> >> >> > produce >> >> >> > exactly >> >> >> > the same output. Hence, the ONLY reason to operate in dp/dt space >> >> >> > is >> >> >> > for >> >> >> > the >> >> >> > learning, as the transformation itself is unaffected. >> >> >> > >> >> >> > Now, if you look for an association in dp/dt space and decide to >> >> >> > recognize >> >> >> > it, that same neuron with then operate to recognize a combination, >> >> >> > once >> >> >> > its >> >> >> > output has been integrated. Of course, not integrating but simply >> >> >> > using >> >> >> > its >> >> >> > output by subsequent neurons, the entire system will operate as >> >> >> > though >> >> >> > it >> >> >> > recognized the combination, even though, if you attached an >> >> >> > oscilloscope >> >> >> > to >> >> >> > the output, you would see positive and negative spikes around what >> >> >> > would >> >> >> > be >> >> >> > a steady-state output in "object" mode. >> >> >> > >> >> >> > In short, it programs based on associations, but functions based >> >> >> > on >> >> >> > represented combinations, that representation being the dp/dt of >> >> >> > the >> >> >> > combination. >> >> >> > >> >> >> >>> >> >> >> >>> >> >> >> >>> Instead, you waved your hands and said "fast unsupervised >> >> >> >>> learning >> >> >> >>> > then becomes completely trivial" .... this statement is a >> >> >> >>> declaration that a good mechanism is available. >> >> >> >>> >> >> >> >>> You then also talk about "like" objects. But the whole >> >> >> >>> concept >> >> >> >>> of >> >> >> >>> "like" is extraordinarily troublesome. Are Julius Caesar and >> >> >> >>> Brutus >> >> >> >>> "like" each other? Seen from our distance, maybe yes, but >> >> >> >>> from >> >> >> >>> the >> >> >> >>> point of view of Julius C., probably not so much. Is a >> >> >> >>> G-type >> >> >> >>> star >> >> >> >>> "like" a mirror? I don't know any stellar astrophysicists >> >> >> >>> who >> >> >> >>> would >> >> >> >>> say so, but then again OF COURSE they are, because they are >> >> >> >>> almost >> >> >> >>> indistinguishable, because if you hold a mirror up in the >> >> >> >>> right >> >> >> >>> way >> >> >> >>> it can reflect the sun and the two visual images can be >> >> >> >>> identical. >> >> >> >>> >> >> >> >>> These questions can be resolved, sure enough, but it is the >> >> >> >>> whole >> >> >> >>> business of resolving these questions (rather than waving a >> >> >> >>> hand >> >> >> >>> over them and declaring them to be trivial) that is the >> >> >> >>> point. >> >> >> >>> >> >> >> >>> I think that pretty much everyone everyone who has "dented >> >> >> >>> their >> >> >> >>> pick" >> >> >> >>> on unsupervised learning (this includes myself. Does anyone else >> >> >> >>> here >> >> >> >>> have >> >> >> >>> these same scars?) has developed methods that would work on >> >> >> >>> "completely >> >> >> >>> obvious" test cases but failed miserably on real-world input. My >> >> >> >>> point >> >> >> >>> here >> >> >> >>> is that looking at things from a dp/dt point of view, real-world >> >> >> >>> situations >> >> >> >>> become about as simple as "completely obvious" test cases. >> >> >> >>> I would quote some good source to make this point, but I don't >> >> >> >>> think >> >> >> >>> anyone has gone here yet. >> >> >> >> >> >> >> >>> >> >> >> >>> >> >> >> >>> If you don't have a clear demonstration that this dp/dt idea >> >> >> >>> does >> >> >> >>> deliver >> >> >> >>> the goods, why are you claiming that it does? Surely it is one >> >> >> >>> or >> >> >> >>> the >> >> >> >>> other? >> >> >> >>> >> >> >> >>> This month I am wearing my mathematician hat. My son Eddie is >> >> >> >>> the >> >> >> >>> NN >> >> >> >>> hacker of the family, and he is waiting impatiently for me to >> >> >> >>> declare >> >> >> >>> a >> >> >> >>> tentative completion so he can run with it. >> >> >> >>> >> >> >> >>> For now, my goal is to come up with sufficiently good theory >> >> >> >>> that >> >> >> >>> even >> >> >> >>> you can't poke any significant holes in it. Once I become the >> >> >> >>> first >> >> >> >>> person >> >> >> >>> in history to ever receive the the Loosemore Seal of No >> >> >> >>> Objection, >> >> >> >>> I >> >> >> >>> will >> >> >> >>> probably wrap this thing up and turn it over to Eddie. >> >> >> >>>> >> >> >> >>>> But Steve, if YOU claim that "looking at things from a dp/dt >> >> >> >>>> point >> >> >> >>>> of >> >> >> >>>> view" does in fact yield a dramatic breakthrough that allows >> >> >> >>>> unsupervised >> >> >> >>>> learning to work on real world cases (something nobody else can >> >> >> >>>> do >> >> >> >>>> right >> >> >> >>>> now), >> >> >> >>>> >> >> >> >>>> Not entirely true, as PCA does what could be considered to be >> >> >> >>>> unsupervised learning, though granted, it is WAY too >> >> >> >>>> inefficient >> >> >> >>>> for >> >> >> >>>> NN/AGI >> >> >> >>>> use without dp/dt. >> >> >> >>>>> >> >> >> >>>>> then YOU are expected to be the one who has gone there, done >> >> >> >>>>> it, >> >> >> >>>>> and >> >> >> >>>>> come back with evidence that your idea does in fact do that. >> >> >> >> >> >> >> >> First comes the theory, then comes the demo. Neither contains any >> >> >> >> sort >> >> >> >> of >> >> >> >> proof, but it is a LOT cheaper to shoot something down BEFORE it >> >> >> >> is >> >> >> >> built >> >> >> >> than after. Hence, I find this exercise VERY valuable. THANKS. >> >> >> >> Please >> >> >> >> keep >> >> >> >> up the good work. >> >> >> >> >> >> >> >> Steve Richield >> >> >> > >> >> >> > ________________________________ >> >> >> > agi | Archives | Modify Your Subscription >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Abram Demski >> >> >> Public address: [email protected] >> >> >> Public archive: http://groups.google.com/group/abram-demski >> >> >> Private address: [email protected] >> >> >> >> >> >> >> >> >> ------------------------------------------- >> >> >> agi >> >> >> Archives: https://www.listbox.com/member/archive/303/=now >> >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> >> >> Modify Your Subscription: https://www.listbox.com/member/?& >> >> >> Powered by Listbox: http://www.listbox.com >> >> > >> >> > ________________________________ >> >> > agi | Archives | Modify Your Subscription >> >> >> >> >> >> >> >> -- >> >> Abram Demski >> >> Public address: [email protected] >> >> Public archive: http://groups.google.com/group/abram-demski >> >> Private address: [email protected] >> >> >> >> >> >> ------------------------------------------- >> >> agi >> >> Archives: https://www.listbox.com/member/archive/303/=now >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> >> Modify Your Subscription: https://www.listbox.com/member/?& >> >> Powered by Listbox: http://www.listbox.com >> > >> > ________________________________ >> > agi | Archives | Modify Your Subscription >> >> >> >> -- >> Abram Demski >> Public address: [email protected] >> Public archive: http://groups.google.com/group/abram-demski >> Private address: [email protected] >> >> >> ------------------------------------------- >> agi >> Archives: https://www.listbox.com/member/archive/303/=now >> RSS Feed: https://www.listbox.com/member/archive/rss/303/ >> Modify Your Subscription: https://www.listbox.com/member/?& >> Powered by Listbox: http://www.listbox.com > > ________________________________ > agi | Archives | Modify Your Subscription -- Abram Demski Public address: [email protected] Public archive: http://groups.google.com/group/abram-demski Private address: [email protected] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
