This (goal + world state == action?) deserves it's own discussion thread, I
think Ian. I've been pondering along these lines all weekend.


On Mon, Sep 2, 2013 at 4:29 PM, Ian Danforth <[email protected]>wrote:

> The control problem always reminds me of this classic motor learning
> experiment:
>
> Imagine you're asked to hold and shift a lever from one position to
> another in a straight line, but the lever keeps trying to drift sideways,
> or first left, then right. Neuroscientists do this with monkeys to watch
> them learn motor tasks in a new, dynamic environment.
>
> See this short paper for a good example:
> http://www.pnas.org/content/97/5/2259.full.pdf
>
> So basically you have a representation of a goal that doesn't change, but
> how you coordinate your muscles during the movement has to be relearned to
> get to that goal. Some cells will always be in charge of dynamically
> reacting to the world, and others will learn about these new world dynamics
> and remember them for next time.
>
> To properly implement this in CLA I think there needs to be a goal SDR, a
> series of predictions given the current input and the goal, and a mechanism
> by which the goal and the predictions are compared moment to moment to the
> input from muscles and the world. Moment to moment corrections would
> probably be random at first, then gross movements reinforced for reducing
> the error between input and prediction, and finally refined.
>
> One aspect I *really* don't have a good answer for though is how does goal
> + world state == action? Is it the intensity of the goal representation? Is
> it a goal plus some global trigger that says "go?"
>
> Anyway, getting off topic a bit. I think what you're trying to do will
> eventually work, but it needs to be tightly coupled to the way CLA learns
> to be effective.
>
> Ian
>
>
> On Mon, Sep 2, 2013 at 3:38 PM, Pedro Tabacof <[email protected]> wrote:
>
>> Ian,
>>
>> Your 10 points are all spot on, great job on understanding my mess!
>>
>> I'm not sure if feeding the CLA open-loop (no kind of control on) data
>> would be useful on practice, because in this case you're probably better
>> off with standard MPC, but this is probably the right way to start to
>> tackle this problem. It's actually not that hard to simulate a complicated
>> dynamic system with noise and disturbances and gather "experimental" data.
>> If I have time I will look into this and share my findings here.
>>
>> It'd be cool to see how the CLA would respond to things such as large
>> time constants (slow dynamic response) and/or considerable deadtime
>> (time-delay) before trying to actually control a system with it. The main
>> difference from this to typical CLA applications is that the system inputs
>> are independent and thus their prediction is meaningless. Would this make
>> the prediction of the system outputs (which depend on their past values and
>> on current and past inputs) harder or just the same?
>>
>> From your explanation it seems the optimization time is not an issue,
>> especially considering you could turn it off after a while because the CLA
>> would probably have already learned the correct control patterns. It could
>> be turned on only when needed to improve the control and perhaps be done
>> offline.
>>
>> If I recall correctly, Jeff wrote on one email that multiple step
>> prediction is actually made by an external classifier, so it is not
>> actually inherent to the CLA. Can someone clarify this point? Multiple step
>> prediction is essential to MPC so I'd like to understand it better.
>>
>> Anyways, I've been pondering about my MPC idea and more and more I tend
>> to believe that it is just too convoluted to work - I always favor simple
>> solutions over complex ones. If we had motor control CLA I think this could
>> be a great target for application, but it seems this is nowhere near our
>> present.
>>
>> Perhaps training NuPIC on data from a classical controller such as PID or
>> even manual control and then using a simple reinforcement learning
>> procedure to train NuPIC's predictions in order to improve the control
>> scheme (squared error and smoothness as you put it) would be a better
>> solution, but I'm not clear on how this could be done.
>>
>> []'s
>> Pedro.
>>
>>
>> On Mon, Sep 2, 2013 at 3:32 PM, Matthew Taylor <[email protected]> wrote:
>>
>>> On Sep 2, 2013, at 11:05 AM, Ian Danforth <[email protected]>
>>> wrote:
>>>
>>> >  I'm going to be stupid in public...
>>>
>>> If only everyone were so fearless. :)
>>>
>>> Matt
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>
>>
>>
>> --
>> Pedro Tabacof,
>> Unicamp - Eng. de Computação 08.
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to