Thanks for that explanation Matt. I remember reading something like that a long time ago (way before 1991!) I could not figure out what you had originally said and I wondered if there was some research on the strength or estimation of the probability of reinforcement in a more complex sense. Behaviorism floundered outside the edge of complexity because it was not able to produce reliable results except for very simple observable hypotheses. There are some who would disagree but cognitive psychology was able to produce some more insight where Behaviorism could not go by using co-variants with more complex hypotheses. These findings had to be examined with some kinds of cross-hypothesis, but that is not a bad thing. But this idea that Behaviorism floundered on the outer edge of every day complexity is interesting because it seems so analogous to AI/AGI. Jim Bromer > Date: Thu, 1 Aug 2013 14:03:31 -0400 > Subject: Re: [agi] A Very Simple AGI Project > From: [email protected] > To: [email protected] > > On Thu, Aug 1, 2013 at 4:41 AM, Jim Bromer <[email protected]> wrote: > > Matt: > > > Probability is a mathematical model > > > of belief, not of reality. And in general, brains estimate > > > probabilities using the 1/t rule. This was confirmed with many > > > experiments in classical conditioning and reinforcement learning in > > > animals. > > > > Do you remember how this was confirmed? How can the belief of an animal be > > confirmed? And how can a reported belief value of a human being be relied > > on? > > In animal experiments in classical conditioning, the time to learn the > association (the inverse of the learning rate) is proportional to the > time interval, t, from the conditioned stimulus to the unconditioned > stimulus. In reinforcement learning, it is proportional to the > interval between the action and the reinforcement signal. There is no > learning when t is negative (when the order of events is reversed). > > Well, not exactly. The function 1/t has a discontinuity at t = 0, but > real functions must be bounded and continuous. It actually peaks at > around 0.1 or 0.2 seconds. > > This was from a psychology textbook. Schwartz, Barry, and Daniel > Reisberg (1991), Learning and Memory, New York: W. W. Norton and > Company. > > > Here is a cross-test that popped into my head. Suppose that a more > > probable reward is associated with a very difficult task and the less > > probable reward is associated with a simple task. Are you telling me that > > an intelligent animal won't first try a simpler task that has produced a > > reward less frequently? My guess is that he will try the simpler task a > > number of times before he would go after the difficult task if the > > difficulty was severe enough. At what difference between the rate of past > > rewards does the intelligent animal go for the difficult task before the > > easy task? > > Depends on how you measure difficulty. And why would you want to > complicate the experiment this way? I'm talking about learning, which > is the rate of change of belief, as measured by the rate of change of > the probability estimate. You are going to make decisions based on > expected reward, or reward times probability of receiving it. > > -- > -- Matt Mahoney, [email protected] > > > ------------------------------------------- > AGI > Archives: https://www.listbox.com/member/archive/303/=now > RSS Feed: https://www.listbox.com/member/archive/rss/303/24379807-f5817f28 > Modify Your Subscription: https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com
------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
