On Sun, May 12, 2019 at 10:22 PM Sergio VM <[email protected]> wrote:

> Hi King Yin,
>
> The architecture looks very interesting. I am just missing the definition
> of the reward function (or kernel if you make it stochastic).
>
> On the other hand, I don't understand your previous comment on the
> Lagrangian and Hamiltonian. I haven't seen the previous version of the
> paper. But you can apply an optimal control approach without having to
> consider the velocity at all.
>


The reward function is given externally by some "AI teachers".  For
example, rewards given by an Atari game.

The Lagrangian is the same as the instantaneous reward the system gets at
time t.  In some cases, such as chess, the reward is just a delta function
given at the terminal state (eg checkmate).  The Bellman equation (for
dynamic programming) always works, no matter how the rewards are given.
The Hamiltonian / Lagrangian control theory may also work if the Lagrangian
is given as some delta functions, but in such a case, the solution of the
differential equation would involve a discretization process that simply
reduces to the discrete dynamic programming case.  In other words, the use
of differential equations has no advantage over the discrete case!  Things
would be different if the reward (Lagrangian) is differentiable against the
position x and velocity x dot.  But such is not the case for some real-life
problems such as logic puzzles or chess games, in which the reward only
occurs sparsely.  Hope it answers your question?☺

I will re-organize the material in the older version and post it somewhere,
just so the work is not wasted.  But I don't see any easy way to bridge
that gap.  It doesn't seem to be a good idea to temper with the reward
function, other than the way it is given by the problem setup....

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-M786f293ff3d94b5c7bbc2660
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to