Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Abram Demski Fri, 26 Dec 2008 21:53:08 -0800

Steve,

When I made the statement about Fourier I was thinking of JPEG
encoding. A little digging found this book, which presents a unified
approach to (low-level) computer vision based on the Fourier
transform:


http://books.google.com/books?id=1wJuTMbNT0MC&dq=fourier+vision&printsec=frontcover&source=bl&ots=3ogSJ2i5uW&sig=ZdvvWvu82q8UX1c5Abq6hWvgZCY&hl=en&sa=X&oi=book_result&resnum=2&ct=result#PPA4,M1

>> But that is beside the present point. :)
>
>
> Probably so. I noticed that you recently graduated, so I thought that I
> would drop that thought to make (or unmake) your day.

:) I should really update that. It's been a while now.

>> generally, any transform that makes the data more sparse, or simpler,
>> seems good
>
>
> Certainly if it results in extracting some useful of merit.
>
>>
>> -- which is of course what PCA does,
>
>
> Sometimes yes, and sometimes no. I am looking at incremental PCA approaches
> that reliably extract separate figures of merit rather than smushed-together
> figures of merit as PCA often does.

How do you define "figures of merit"? Sounds like an ill-defined
problem to me. We don't know which features we *really* want to
extract from an image until we know the utility function of the
environment, and so know what information will help us achieve our
goals.

--Abram

On Sat, Dec 27, 2008 at 12:01 AM, Steve Richfield
<[email protected]> wrote:
> Abram,
>
> On 12/26/08, Abram Demski <[email protected]> wrote:
>>
>> Steve,
>>
>> It is strange to claim that prior PhDs will be worthless when what you
>> are suggesting is that we apply the standard methods to a different
>> representation.
>
>
> Much of AI and pretty much all of AGI is built on the proposition that we
> humans must code knowledge because the stupid machines can't efficiently
> learn it on their own, in short, that UNsupervised learning is difficult.
> Note that in nature, UNsupervised learning handily outperforms supervised
> learning. What good is supervised NN technology when UNsupervised NNs will
> perform MUCH better? What good are a few hand-coded AGI rules and the engine
> that runs them, when an UNsupervised AGI can learn them orders of magnitude
> faster than cities full of programmers? Note my prior post where I explain
> that either AGIs must either abandon UNsuperised learning, or switch to a
> NN-like implementation. In short, easy UNsupervised learning will change
> things about as much as the switch from horse and buggy to automobiles,
> leaving present PhDs in the position of blacksmiths and historians. Sure
> blacksmiths had transferrable skills, but they weren't worth much and they
> weren't respected at all.
>
> In the 1980s, countless top computer people (including myself) had to
> expunge all references to mainframe computers from our resumes in order to
> find work in a microcomputer-dominated field. I expect to see rounds of the
> same sort of insanity when UNsupervised learning emerges.
>
>>
>> But that is beside the present point. :)
>
>
> Probably so. I noticed that you recently graduated, so I thought that I
> would drop that thought to make (or unmake) your day.
>
>>
>> Taking the derivative, or just finite differences, is a useful step in
>> more ways then one. You are talking about taking differences over
>> time, but differences over space can used for edge detection,
>> frequently thought of as the first step in visual processing.
>
>
> Correct. My paper goes into using any dimension that is differentiable. Note
> that continuous eye movement converts a physical dimension to time domain.
>
>>
>> More
>> generally, any transform that makes the data more sparse, or simpler,
>> seems good
>
>
> Certainly if it results in extracting some useful of merit.
>
>>
>> -- which is of course what PCA does,
>
>
> Sometimes yes, and sometimes no. I am looking at incremental PCA approaches
> that reliably extract separate figures of merit rather than smushed-together
> figures of merit as PCA often does. Another problem with classical PCA is
> that it can't provide real-time learning, but instead, works via a sort of
> "batch processing" of statistics collected in the array that is being
> transformed.
>
>>
>> and derivatives in
>> time/space, and also the fourier transform I think. The usefulness of
>> these transforms springs from underlying regularities in the data.
>
>
> Hmmm, I don't see where a Fourier transform would enter the cognitive
> process. Perhaps you see something that I have missed?
>
>>
>> That's not to say that I don't think some representations are
>> fundamentally more useful than others-- for example, I know that some
>> proofs are astronomically larger in 1st-order logic as compared to
>> 2nd-order logic, even in domains where 1st-order logic is
>> representationally sufficient.
>>
>> The statement about time correction reminds me of a system called
>> PURR-PUSS.
>
>
>
> However, as I understand it, the Purposeful Unprimed Real-world Robot with
> Predictors Using Short Segments still relied on rewards and punishments for
> learning.
>>
>> It is turing-complete in some sense, essentially by
>> compounding time-delays, but I do not know exactly what sense (ie, a
>> turing complete *learner* is very different then a turing-complete
>> *programmable computer*... PURR PUSS uses something inbetween called
>> "soft teaching" if I recall correctly.)
>
>
> The old DEC LINC and LINC-8 computers operated the instruction sequencing
> with a pile of time delay modules, and someone had to go in and recalibrate
> every few months.
>
> Steve Richfield
> ==================
>
>>
>> On Fri, Dec 26, 2008 at 3:26 PM, Steve Richfield
>> <[email protected]> wrote:
>> > Abram,
>> >
>> > On 12/26/08, Abram Demski <[email protected]> wrote:
>> >>
>> >> Steve,
>> >>
>> >> Richard is right when he says temporal simultaneity is not a
>> >> sufficient principle.
>> >
>> >
>> > ... and I fully agree. However, we must unfold this thing one piece at a
>> > time.
>> >
>> > Without the dp/dt "trick", there doesn't seem to be any way to make
>> > unsupervised learning work, and I appear to be the first to stumble onto
>> > dp/dt. This is a whole new and unexplored world, where the things that
>> > stymied past unsupervised efforts fall out effortlessly, but some new
>> > challenges present themselves.
>> >
>> >>
>> >> Suppose you present your system with the
>> >> following sequences (letters could be substituted for sounds, colors,
>> >> objects, whatever):
>> >>
>> >> ABCABCABCABC...
>> >>
>> >> AAABBBAAABBB...
>> >>
>> >> ABBAAABBBBAAAAABBBBBB...
>> >>
>> >> ABBCCCDDDDEEEEEFFFFFF...
>> >>
>> >> ABACABADABACABAEABACABADABACABA...
>> >>
>> >> All of these sequences have "concepts" behind them. All of these
>> >> concepts are immune to temporal-simultaneity-learning (although the
>> >> first could be learned by temporal adjacency, and the second by
>> >> temporal adjacency with a delay of 3).
>> >
>> >
>> > The way that wet neurons are built, this is unavoidable! Here is another
>> > snippet from my paper...
>> >
>> >
>> > Time Correction
>> >
>> > Electronics designers routinely use differentiation and integration to
>> > advance and retard timing. Phase-linear low-pass filters are often used
>> > to
>> > make short delays in a signal, and "peaking" capacitors were used in RTL
>> > (Resistor Transistor Logic) to differentiate inputs for quicker output.
>> > Further, wet neurons introduce their own propagation delays from input
>> > synapse to output synapse. If not somehow corrected, the net effect of
>> > this
>> > is a scrambling of the time that a given signal/node/term represents,
>> > which
>> > if left uncorrected, would result in relating signals together that are
>> > arbitrarily shifted in time. There seems to be three schools of thought
>> > regarding this:
>> >
>> > No problem. This simply results in considering various things shifted
>> > arbitrarily in time. When wet neurons learn what works, this will result
>> > in
>> > recognizing time-sequenced phenomena. Arbitrary delays might also do a
>> > lot
>> > for artificial neurons.
>> >
>> > Time correction could be instituted, e.g. through Taylor series signal
>> > extrapolation to in effect remove a neuron's delay, at the cost of
>> > introducing considerable noise into the result. My own simulations of
>> > Taylor
>> > series extrapolation functions showed that the first derivative may
>> > indeed
>> > help for small corrections, but beyond that, subtle changes in the shape
>> > of
>> > a transition cause wild changes in the extrapolated result, sometimes
>> > going
>> > so far as to produce short bursts of oscillation. Downstream neurons may
>> > then amplify these problems to produce havoc at the output of the
>> > artificial
>> > neural network.
>> >
>> > The method utilized in CRAY computers might be in use, where all delays
>> > were
>> > a precise multiple (of their clock rate) long. This was achieved by
>> > using
>> > interconnecting wires cut to certain specific lengths, even though the
>> > length may be much longer than actually physically needed to
>> > interconnect
>> > two components. Perhaps wet neurons only come in certain very specific
>> > delays. There is some laboratory evidence for this, as each section of
>> > our
>> > brains has neurons with similar geometry within the group. This has been
>> > presumed to be an artifact of evolution and limited DNA space, but may
>> > in
>> > fact be necessary for proper time correction.
>> >
>> > No one now knows which of these are in use in wet neurons. However,
>> > regardless of wet-neuron functionality, artificial neural network
>> > researchers should be attentive to time correction
>> >
>> > Note that #1 above unavoidably solves the time-sequencing puzzle.
>> > Introduce
>> > some integration, and the sequencing can be arbitrarily shifted in time
>> > -
>> > within reasonable limits (seconds, maybe a minute or two).
>> >
>> >>
>> >> The transition to sequence learning is (at least, in my eyes) a
>> >> transition to relational learning, as opposed to the "flat" learning
>> >> that PCA is designed for.
>> >
>> >
>> > I suspect that PCA-like methods are at work within neurons, and that
>> > sequence learning and the like fall out from inter-neuronal connections
>> > and
>> > the associated delays, integration, etc.
>> >>
>> >> In other words, completely new methods are
>> >> required. You already begin that transition by invoking dp/dt, which
>> >> assumes a temporal aspect to the data...
>> >>
>> >> See this blog post for a more full account of my view on the current
>> >> state of affairs. (It started out as a post about a new algorithm I'd
>> >> been thinking about, but turned into an essay on the difference
>> >> between relational methods and "flat" (propositional) methods, and how
>> >> to bridge the gap. If you're wondering about the title, see the
>> >> previous post.)
>> >>
>> >>
>> >> http://dragonlogic-ai.blogspot.com/2008/12/back-to-ai-ok-here-we-go.html
>> >
>> >
>> > This blog and this email reflect a common problem with AI-thought. There
>> > are
>> > LOTS of things that people are VERY bad at doing, and these generally
>> > make
>> > at the same time horrible examples to test human cognition theories on,
>> > and
>> > wonderful potential AI applications.
>> >
>> > A perfect example is health and disease, where the human cognition
>> > process
>> > tends to run in unproductive directions. Any given set of symptoms
>> > typically
>> > has ~12 different common underlying causal mechanisms, each of which has
>> > several cause-and-effect chain links that are typically arranged in a
>> > figure
>> > "6" configuration with a self-sustaining loop at the end. Given
>> > limitless
>> > understanding, it typically takes two seemingly unrelated actions to
>> > actually cure anything, one to stop the lead-in, and the other to
>> > momentarily interrupt the self-sustaining loop.
>> >
>> > It is my present suspicion that unsupervised learning is SO simple that
>> > it
>> > just falls out of a system using the right representation. Even the
>> > simplest
>> > of creatures do quite well at it. However, without that representation,
>> > it
>> > is horrifically hard/impossible. This means that NN and AGI guys should
>> > all
>> > STOP whatever they are doing and find the right representation, which is
>> > the
>> > path that I have gone on.
>> >
>> > Note that if I am successful, that prior PhDs in AI/NN won't be worth
>> > spit
>> > because they will be built on false premises. Good for history, but bad
>> > for
>> > science.
>> >
>> > Thanks for your thoughts. Any more?
>> >
>> > Steve Richfield
>> > ========================
>> >>
>> >> On Fri, Dec 26, 2008 at 2:31 AM, Steve Richfield
>> >> <[email protected]> wrote:
>> >> > Richard,Richard,
>> >> >
>> >> > On 12/25/08, Richard Loosemore <[email protected]> wrote:
>> >> >>
>> >> >> Steve Richfield wrote:
>> >> >>>
>> >> >>>  There are doubtless exceptions to my broad statement, but
>> >> >>> generally,
>> >> >>> neuron functionality is WIDE open to be pretty much ANYTHING you
>> >> >>> choose,
>> >> >>> including that of an AGI engine's functionality on its equations.
>> >> >>>  In the reverse, any NN could be expressed in a shorthand form that
>> >> >>> contains structure, synapse functions, etc., and an AGI engine
>> >> >>> could
>> >> >>> be
>> >> >>> built/modified to function according to that shorthand.
>> >> >>>  In short, mapping between NN and AGI forms presumes flexibility in
>> >> >>> the
>> >> >>> functionality of the target form. Where that flexibility is NOT
>> >> >>> present,
>> >> >>> e.g. because of orthogonal structure, etc., then you must ask
>> >> >>> whether
>> >> >>> something is being gained or lost by the difference. Clearly, any
>> >> >>> transition
>> >> >>> that involves a loss should be carefully examined to see if the
>> >> >>> entire
>> >> >>> effort is headed in the wrong direction, which I think was your
>> >> >>> original
>> >> >>> point here.
>> >> >>
>> >> >>
>> >> >> There is a problem here.
>> >> >>
>> >> >> When someone says "X and Y can easily be mapped from one form to the
>> >> >> other" there is an implication that they are NOt suggesting that we
>> >> >> go
>> >> >> right
>> >> >> down to the basic constituents of both X and Y in order to effect
>> >> >> the
>> >> >> mapping.
>> >> >>
>> >> >> Thus:  "Chalk and Cheese can easily be mapped from one to the other"
>> >> >> ....
>> >> >> trivially true if we are prepared to go down to the common
>> >> >> denominator
>> >> >> of
>> >> >> electrons, protons and neutrons.  But if we stay at a sensible level
>> >> >> then,
>> >> >> no, these do not map onto one another.
>> >> >
>> >> >
>> >> > The problem here is that you were thinking present existing NN and
>> >> > AGI
>> >> > systems, neither of which work (yet) in any really useful way, that
>> >> > it
>> >> > was
>> >> > obviously impossible to directly convert from one system with its set
>> >> > of
>> >> > bad
>> >> > assumptions to another system with a completely different set of bad
>> >> > assumptions. I completely agree, but I assert that the answer to that
>> >> > particular question is of no practical interest to anyone.
>> >> >
>> >> > On the other hand, converting between NN and AGI systems built on the
>> >> > SAME
>> >> > set of assumptions would be simple. This situation doesn't yet exist.
>> >> > Until
>> >> > then, converting a program from one dysfunctional platform to another
>> >> > is
>> >> > uninteresting. When the assumptions get ironed out, then all systems
>> >> > will be
>> >> > built on the same assumptions, and there will be few problems going
>> >> > between
>> >> > them, EXCEPT:
>> >> >
>> >> > Things need to be arranged in arrays for automated learning, which
>> >> > much
>> >> > more
>> >> > fits the present NN paradigm than the present AGI paradigm.
>> >> >>
>> >> >> Similarly, if you claim that NN and regular AGI map onto one
>> >> >> another, I
>> >> >> assume that you are saying something more substantial than that
>> >> >> these
>> >> >> two
>> >> >> can both be broken down into their primitive computational parts,
>> >> >> and
>> >> >> that
>> >> >> when this is done they seem equivalent.
>> >> >
>> >> >
>> >> > Even this breakdown isn't required if both systems are built on the
>> >> > same
>> >> > correct assumptions. HOWEVER, I see no way to transfer fast learning
>> >> > from an
>> >> > NN-like construction to an AGI-like construction. Do you? If there is
>> >> > no
>> >> > answer to this question, then this unanswerable question would seem
>> >> > to
>> >> > redirect AGI efforts to NN-like constructions if they are ever to
>> >> > learn
>> >> > like
>> >> > we do.
>> >> >>
>> >> >> NN and regular AGI, they way they are understood by people who
>> >> >> understand
>> >> >> them, have very different styles of constructing intelligent
>> >> >> systems.
>> >> >
>> >> >
>> >> > Neither of which work (yet). Of course, we are both trying to fill in
>> >> > the
>> >> > gaps.
>> >> >>
>> >> >> Sure, you can code both in C, or Lisp, or Cobol, but that is to
>> >> >> trash
>> >> >> the
>> >> >> real meaning of "are easily mapped onto one another".
>> >> >
>> >> >
>> >> > One of my favorite consulting projects involved coding an AI program
>> >> > to
>> >> > solve complex problems that were roughly equivalent to solving
>> >> > algebraic
>> >> > equations. This composed the Yellow pages for 28 different large
>> >> > phone
>> >> > directories. The project was for a major phone company and had to be
>> >> > written
>> >> > entirely in COBOL. Further, it had to run at n log n speed and NOT
>> >> > n^2
>> >> > speed, which I did by using successive sorts instead of list
>> >> > processing
>> >> > methods. It would have been rather difficult to achieve the needed
>> >> > performance in C or Lisp, even though COBOL would seem to be
>> >> > everyone's
>> >> > first choice as the last choice on the list of prospective platforms.
>> >> >>>
>> >> >>>    ), instead of operating on "objects" (in an
>> >> >>>
>> >> >>>        object-oriented sense)
>> >> >>>
>> >> >>>
>> >> >>>    Neither NN nor AGI has any intrinsic relationship to OO.
>> >> >>>
>> >> >>>  Clearly I need a better term here. Both NNs and AGIs tend to have
>> >> >>> neurons or equations that reflect the presence (or absence) of
>> >> >>> various
>> >> >>> objects, conditions, actions, etc. My fundamental assertion is that
>> >> >>> if
>> >> >>> you
>> >> >>> differentiate the inputs so that everything in the entire network
>> >> >>> reflects
>> >> >>> dp/dt instead of straight probabilities, then the network works
>> >> >>> identically,
>> >> >>> but learning is GREATLY simplified.
>> >> >>
>> >> >> Seems like a simple misunderstanding:  you were not aware that
>> >> >> "object
>> >> >> oriented" does not mean the same as saying that there are
>> >> >> fundamental
>> >> >> atomic
>> >> >> constituents of a representation.
>> >> >
>> >> >
>> >> > A typical semantic overloading problem. "Atomic consitituent
>> >> > orientation"
>> >> > doesn't really work either, because in later stages, individual
>> >> > terms/neurons can represent entire concepts, strategies, etc. I am
>> >> > still
>> >> > looking for a good term here.
>> >> >>>
>> >> >>>
>> >> >>>    , instead, operates on the rate-of-changes in the
>> >> >>>
>> >> >>>        probabilities of "objects", or dp/dt. Presuming sufficient
>> >> >>>        bandwidth to generally avoid superstitious coincidences,
>> >> >>> fast
>> >> >>>        unsupervised learning then becomes completely trivial, as
>> >> >>> like
>> >> >>>        objects cause simultaneous like-patterned changes in the
>> >> >>> inputs
>> >> >>>        WITHOUT the overlapping effects of the many other objects
>> >> >>>        typically present in the input (with numerous minor
>> >> >>> exceptions).
>> >> >>>
>> >> >>>
>> >> >>>    You have already presumed that something supplies the system
>> >> >>> with
>> >> >>>    "objects" that are meaningful.  Even before your first mention
>> >> >>> of
>> >> >>>    dp/dt, there has to be a mechanism that is so good that it never
>> >> >>>    invents objects such as:
>> >> >>>
>> >> >>>    Object A:  "A person who once watched all of Tuesday Welds
>> >> >>> movies
>> >> >>> in
>> >> >>>    the space of one week" or
>> >> >>>
>> >> >>>    Object B:  "Something that is a combination of Julius Caesar's
>> >> >>> pinky
>> >> >>>    toe and a sour grape that Brutus' just spat out" or
>> >> >>>
>> >> >>>    Object C:  "All of the molecules involved in a swiming gala that
>> >> >>>    happen to be 17.36 meters from the last drop of water that
>> >> >>> splashed
>> >> >>>    from the pool".
>> >> >>>
>> >> >>>    You have supplied no mechanism that is able to do that, but that
>> >> >>>    mechanism is 90% of the trouble, if learning is what you are
>> >> >>> about.
>> >> >>>
>> >> >>>  With prior unsupervised learning you are 100% correct. However
>> >> >>> none
>> >> >>> of
>> >> >>> the examples you gave involved temporal simultaneity. I will
>> >> >>> discuss B
>> >> >>> above
>> >> >>> because it is close enough to be interesting.
>> >> >>>  If indeed someone just began to notice something interesting about
>> >> >>> Caesar's pinkie toe *_as_* they just began to notice the taste of a
>> >> >>> sour
>> >> >>> grape, then yes, that probably would be leaned via the mechanisms I
>> >> >>> am
>> >> >>> talking about. However, if one was "present perfect tense" while
>> >> >>> the
>> >> >>> other
>> >> >>> was just beginning, then it wouldn't with my approach but would
>> >> >>> with
>> >> >>> prior
>> >> >>> unsupervised learning methods. For example, Caesar's pinkie toe had
>> >> >>> been
>> >> >>> noticed and examined, then before the condition passed they tasted
>> >> >>> a
>> >> >>> sour
>> >> >>> grape, then temporal simultaneity of the dp/dt edges wouldn't exist
>> >> >>> to
>> >> >>> learn
>> >> >>> from. Of course, in both cases, the transforms would work
>> >> >>> identically
>> >> >>> given
>> >> >>> identical prior learning/programming.
>> >> >>
>> >> >>
>> >> >> You have not understood the sense in which I made the point, I fear.
>> >> >
>> >> >
>> >> > I think the reverse is true. Consider...
>> >> >>
>> >> >> I was describing obviously useless concepts.  Ones where there is no
>> >> >> temporal simultaneity.
>> >> >
>> >> >
>> >> > dp/dt is unable to even notice things that lack temporal
>> >> > simultaneity,
>> >> > so
>> >> > the examples you gave, though typical challenges to past unsupervised
>> >> > learning, and complete non-issues in dp/dt space.
>> >> >>
>> >> >>   Concepts thrown together out of completely useless components.
>> >> >
>> >> >
>> >> > ... that require SOME force/reason/bug/error/etc to get thrown
>> >> > together.
>> >> > I
>> >> > think we both understand how this was a typical challenge to past
>> >> > unsupervised learning efforts. I am asserting that in dp/dt systems,
>> >> > there
>> >> > is NO force/reason/error/etc to ever throw such things together, and
>> >> > hence,
>> >> > no reason for vastly complex matrix transforms to then try to pull
>> >> > them
>> >> > back
>> >> > apart.
>> >> >>
>> >> >> The question is:  how to build a mechanism that does NOT fall into
>> >> >> the
>> >> >> trap of creating such nonsense-concepts.  If you just say "assume
>> >> >> that
>> >> >> we
>> >> >> have such a concept builder" you beg a million questions.
>> >> >
>> >> >
>> >> >> Your reply, above, took one of my examples and tried to talk about
>> >> >> what
>> >> >> could happen if it was not, after all, a nonsense-concept.
>> >> >
>> >> > Note that GMAIL got sick here, so I'll mark your text with >. Also,
>> >> > some
>> >> > replies are deeply indented, so I have bolded some of them.
>> >> >
>> >> > I was just "playing neuron" without any mindreading abilities.
>> >> >>Alas, that is neither here nor there, because (sure enough)
>> >> >> *everyone*
>> >> >> agrees that temporal simultaneity is a good basic ground for trying
>> >> >> to
>> >> >> construct new concepts (it is the Reason Number One for creating a
>> >> >> new
>> >> >> concept!).  But we also know that just common or garden variety
>> >> >> Temporal
>> >> >> Simultaneity doesn't get you very far .... that is the easiest of
>> >> >> all
>> >> >> mechanisms, and we need a hundred more concept-building mechanisms
>> >> >> that
>> >> >> are
>> >> >> better than that before we have a real concept-generating engine.
>> >> >
>> >> > Now, we can start "picking through" the approaches. I suspect that
>> >> > looking
>> >> > for the principal components of temporally simultaneous inputs goes a
>> >> > LONG
>> >> > was toward what is sought, but have no proof (yet). Do YOU have some
>> >> > idea as
>> >> > to where the threshold of usefullness is?
>> >> >
>> >> >>  And (here is where my point comes back into the picture) if anyone
>> >> >> stands
>> >> >> up and says "Hey everyone!  I have discovered a hundred concept
>> >> >> building
>> >> >> mechanisms that I think will do the trick!", the first question that
>> >> >> the
>> >> >> crowd will ask is:  "Do your mechanisms work together to build real,
>> >> >> sensible concepts, or do they fill the system with bazillions of
>> >> >> really
>> >> >> dumb, useless concepts (like my nonsense list above)?"
>> >> >
>> >> > Clearly, PCA on simultaneous inputs will NOT do that, because they
>> >> > must
>> >> > show
>> >> > common things in order not to end up at the wrong end of the Huffman
>> >> > code.
>> >> >
>> >> >> Anyone who says that they know of a way to get unsupervised learning
>> >> >> to
>> >> >> occur is saying, implicitly, that they have those 100 concept
>> >> >> building
>> >> >> mechanisms ready to go (or one super mechanism as good as all of
>> >> >> them).
>> >> >>  Hence my original point:  you cannot simply imply that your system
>> >> >> is
>> >> >> working with bona-fide, coherent concepts unless you can show that
>> >> >> it
>> >> >> really
>> >> >> does come up with concepts (or objects) that are sensible.
>> >> >
>> >> > Perhaps you could exhibit some examples where learning based on
>> >> > temporal
>> >> > simultaneity with a preference for identifying common patterns (as
>> >> > PCA
>> >> > requires) fails. Clearly, if I think that a relatively simple
>> >> > approach
>> >> > (like
>> >> > PCA on dp/dt inputs) should work, but you are convinced that it will
>> >> > fall
>> >> > into an abyss of superstitious learning, then you will have a MUCH
>> >> > easier
>> >> > time exhibiting a couple of example failures than I will have somehow
>> >> > proving that it always works (which is probably beyond the
>> >> > mathematical
>> >> > state of the art).
>> >> >
>> >> > I'm not saying your are wrong here, only that you may not have heard
>> >> > me
>> >> > (probably my fault for not saying things clearly enough), and you
>> >> > haven't
>> >> > made your point by exhibiting something on which my approach would
>> >> > fail.
>> >> >
>> >> >> FWIW, I would level the same criticism against quite a few other
>> >> >> people,
>> >> >> so you don't stand alone here.
>> >> >
>> >> > My ego is quite indestructable and I understand that your body
>> >> > temperature
>> >> > is low, so you have nothing to worry about here.
>> >> >> (Just briefly:  if I move on to look at your actual reply above, I
>> >> >> see
>> >> >> also mention of rates of change (dp/dt), but no explanation of how
>> >> >> rates of
>> >> >> change of anything would help a system build a concept that is a
>> >> >> combination
>> >> >> (NOT an association, please!) of [Julius Caesar's pinky toe and a
>> >> >> sour
>> >> >> grape
>> >> >> that Brutus' just spat out].  The rates of change seem irrelevant
>> >> >> here).
>> >> >
>> >> > If you take a neuron or Bayesian formula programmed to do something
>> >> > static
>> >> > and throw dp/dt inputs at it, its output will be the dp/dt of the
>> >> > result
>> >> > from static operation. You could then simply integrate it to produce
>> >> > exactly
>> >> > the same output. Hence, the ONLY reason to operate in dp/dt space is
>> >> > for
>> >> > the
>> >> > learning, as the transformation itself is unaffected.
>> >> >
>> >> > Now, if you look for an association in dp/dt space and decide to
>> >> > recognize
>> >> > it, that same neuron with then operate to recognize a combination,
>> >> > once
>> >> > its
>> >> > output has been integrated. Of course, not integrating but simply
>> >> > using
>> >> > its
>> >> > output by subsequent neurons, the entire system will operate as
>> >> > though
>> >> > it
>> >> > recognized the combination, even though, if you attached an
>> >> > oscilloscope
>> >> > to
>> >> > the output, you would see positive and negative spikes around what
>> >> > would
>> >> > be
>> >> > a steady-state output in "object" mode.
>> >> >
>> >> > In short, it programs based on associations, but functions based on
>> >> > represented combinations, that representation being the dp/dt of the
>> >> > combination.
>> >> >
>> >> >>>
>> >> >>>
>> >> >>>    Instead, you waved your hands and said "fast unsupervised
>> >> >>> learning
>> >> >>>     > then becomes completely trivial" .... this statement is a
>> >> >>>    declaration that a good mechanism is available.
>> >> >>>
>> >> >>>    You then also talk about "like" objects.  But the whole concept
>> >> >>> of
>> >> >>>    "like" is extraordinarily troublesome.  Are Julius Caesar and
>> >> >>> Brutus
>> >> >>>    "like" each other?  Seen from our distance, maybe yes, but from
>> >> >>> the
>> >> >>>    point of view of Julius C., probably not so much.  Is a G-type
>> >> >>> star
>> >> >>>    "like" a mirror?  I don't know any stellar astrophysicists who
>> >> >>> would
>> >> >>>    say so, but then again OF COURSE they are, because they are
>> >> >>> almost
>> >> >>>    indistinguishable, because if you hold a mirror up in the right
>> >> >>> way
>> >> >>>    it can reflect the sun and the two visual images can be
>> >> >>> identical.
>> >> >>>
>> >> >>>    These questions can be resolved, sure enough, but it is the
>> >> >>> whole
>> >> >>>    business of resolving these questions (rather than waving a hand
>> >> >>>    over them and declaring them to be trivial) that is the point.
>> >> >>>
>> >> >>>  I think that pretty much everyone everyone who has "dented their
>> >> >>> pick"
>> >> >>> on unsupervised learning (this includes myself. Does anyone else
>> >> >>> here
>> >> >>> have
>> >> >>> these same scars?) has developed methods that would work on
>> >> >>> "completely
>> >> >>> obvious" test cases but failed miserably on real-world input. My
>> >> >>> point
>> >> >>> here
>> >> >>> is that looking at things from a dp/dt point of view, real-world
>> >> >>> situations
>> >> >>> become about as simple as "completely obvious" test cases.
>> >> >>>  I would quote some good source to make this point, but I don't
>> >> >>> think
>> >> >>> anyone has gone here yet.
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> If you don't have a clear demonstration that this dp/dt idea does
>> >> >>> deliver
>> >> >>> the goods, why are you claiming that it does?  Surely it is one or
>> >> >>> the
>> >> >>> other?
>> >> >>>
>> >> >>> This month I am wearing my mathematician hat. My son Eddie is the
>> >> >>> NN
>> >> >>> hacker of the family, and he is waiting impatiently for me to
>> >> >>> declare
>> >> >>> a
>> >> >>> tentative completion so he can run with it.
>> >> >>>
>> >> >>> For now, my goal is to come up with sufficiently good theory that
>> >> >>> even
>> >> >>> you can't poke any significant holes in it. Once I become the first
>> >> >>> person
>> >> >>> in history to ever receive the the Loosemore Seal of No Objection,
>> >> >>> I
>> >> >>> will
>> >> >>> probably wrap this thing up and turn it over to Eddie.
>> >> >>>>
>> >> >>>> But Steve, if YOU claim that "looking at things from a dp/dt point
>> >> >>>> of
>> >> >>>> view" does in fact yield a dramatic breakthrough that allows
>> >> >>>> unsupervised
>> >> >>>> learning to work on real world cases (something nobody else can do
>> >> >>>> right
>> >> >>>> now),
>> >> >>>>
>> >> >>>> Not entirely true, as PCA does what could be considered to be
>> >> >>>> unsupervised learning, though granted, it is WAY too inefficient
>> >> >>>> for
>> >> >>>> NN/AGI
>> >> >>>> use without dp/dt.
>> >> >>>>>
>> >> >>>>> then YOU are expected to be the one who has gone there, done it,
>> >> >>>>> and
>> >> >>>>> come back with evidence that your idea does in fact do that.
>> >> >>
>> >> >> First comes the theory, then comes the demo. Neither contains any
>> >> >> sort
>> >> >> of
>> >> >> proof, but it is a LOT cheaper to shoot something down BEFORE it is
>> >> >> built
>> >> >> than after. Hence, I find this exercise VERY valuable. THANKS.
>> >> >> Please
>> >> >> keep
>> >> >> up the good work.
>> >> >>
>> >> >> Steve Richield
>> >> >
>> >> > ________________________________
>> >> > agi | Archives | Modify Your Subscription
>> >>
>> >>
>> >>
>> >> --
>> >> Abram Demski
>> >> Public address: [email protected]
>> >> Public archive: http://groups.google.com/group/abram-demski
>> >> Private address: [email protected]
>> >>
>> >>
>> >> -------------------------------------------
>> >> agi
>> >> Archives: https://www.listbox.com/member/archive/303/=now
>> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> >> Modify Your Subscription: https://www.listbox.com/member/?&;
>> >> Powered by Listbox: http://www.listbox.com
>> >
>> > ________________________________
>> > agi | Archives | Modify Your Subscription
>>
>>
>>
>> --
>> Abram Demski
>> Public address: [email protected]
>> Public archive: http://groups.google.com/group/abram-demski
>> Private address: [email protected]
>>
>>
>> -------------------------------------------
>> agi
>> Archives: https://www.listbox.com/member/archive/303/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
>
> ________________________________
> agi | Archives | Modify Your Subscription



-- 
Abram Demski
Public address: [email protected]
Public archive: http://groups.google.com/group/abram-demski
Private address: [email protected]


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com

Re: [agi] Introducing Steve's "Theory of Everything" in cognition.

Reply via email to