This (goal + world state == action?) deserves it's own discussion thread, I think Ian. I've been pondering along these lines all weekend.
On Mon, Sep 2, 2013 at 4:29 PM, Ian Danforth <[email protected]>wrote: > The control problem always reminds me of this classic motor learning > experiment: > > Imagine you're asked to hold and shift a lever from one position to > another in a straight line, but the lever keeps trying to drift sideways, > or first left, then right. Neuroscientists do this with monkeys to watch > them learn motor tasks in a new, dynamic environment. > > See this short paper for a good example: > http://www.pnas.org/content/97/5/2259.full.pdf > > So basically you have a representation of a goal that doesn't change, but > how you coordinate your muscles during the movement has to be relearned to > get to that goal. Some cells will always be in charge of dynamically > reacting to the world, and others will learn about these new world dynamics > and remember them for next time. > > To properly implement this in CLA I think there needs to be a goal SDR, a > series of predictions given the current input and the goal, and a mechanism > by which the goal and the predictions are compared moment to moment to the > input from muscles and the world. Moment to moment corrections would > probably be random at first, then gross movements reinforced for reducing > the error between input and prediction, and finally refined. > > One aspect I *really* don't have a good answer for though is how does goal > + world state == action? Is it the intensity of the goal representation? Is > it a goal plus some global trigger that says "go?" > > Anyway, getting off topic a bit. I think what you're trying to do will > eventually work, but it needs to be tightly coupled to the way CLA learns > to be effective. > > Ian > > > On Mon, Sep 2, 2013 at 3:38 PM, Pedro Tabacof <[email protected]> wrote: > >> Ian, >> >> Your 10 points are all spot on, great job on understanding my mess! >> >> I'm not sure if feeding the CLA open-loop (no kind of control on) data >> would be useful on practice, because in this case you're probably better >> off with standard MPC, but this is probably the right way to start to >> tackle this problem. It's actually not that hard to simulate a complicated >> dynamic system with noise and disturbances and gather "experimental" data. >> If I have time I will look into this and share my findings here. >> >> It'd be cool to see how the CLA would respond to things such as large >> time constants (slow dynamic response) and/or considerable deadtime >> (time-delay) before trying to actually control a system with it. The main >> difference from this to typical CLA applications is that the system inputs >> are independent and thus their prediction is meaningless. Would this make >> the prediction of the system outputs (which depend on their past values and >> on current and past inputs) harder or just the same? >> >> From your explanation it seems the optimization time is not an issue, >> especially considering you could turn it off after a while because the CLA >> would probably have already learned the correct control patterns. It could >> be turned on only when needed to improve the control and perhaps be done >> offline. >> >> If I recall correctly, Jeff wrote on one email that multiple step >> prediction is actually made by an external classifier, so it is not >> actually inherent to the CLA. Can someone clarify this point? Multiple step >> prediction is essential to MPC so I'd like to understand it better. >> >> Anyways, I've been pondering about my MPC idea and more and more I tend >> to believe that it is just too convoluted to work - I always favor simple >> solutions over complex ones. If we had motor control CLA I think this could >> be a great target for application, but it seems this is nowhere near our >> present. >> >> Perhaps training NuPIC on data from a classical controller such as PID or >> even manual control and then using a simple reinforcement learning >> procedure to train NuPIC's predictions in order to improve the control >> scheme (squared error and smoothness as you put it) would be a better >> solution, but I'm not clear on how this could be done. >> >> []'s >> Pedro. >> >> >> On Mon, Sep 2, 2013 at 3:32 PM, Matthew Taylor <[email protected]> wrote: >> >>> On Sep 2, 2013, at 11:05 AM, Ian Danforth <[email protected]> >>> wrote: >>> >>> > I'm going to be stupid in public... >>> >>> If only everyone were so fearless. :) >>> >>> Matt >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >> >> >> >> -- >> Pedro Tabacof, >> Unicamp - Eng. de Computação 08. >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
