On Wed, May 15, 2019 at 3:42 AM Sergio VM <[email protected]> wrote:
> Not sure if I am following you...
>
> In order to define the optimal control problem, you need:
>
> - State set: Set of all possible logic propositions. OK
> - Action set: Logic rules. It is not clear to me what this means. Can
> you choose which logic rule to use with which proposition? I mean, the
> actions should be chosen by the agent (there may be constraints on which
> actions are available at each state, but there might be some freedom
> nonetheless, otherwise, there wouldn't be anything to be learned).
> - Expected reward function: some map from the state and action sets to
> the reals. You want it to be non-smooth. OK.
> - Transition kernel: represents the knowledge. Very interesting.
>
> So let me try to understand with an example:
>
> - State at time t is a bunch of propositions', e.g. x_t = {"I am in my
> place", "my place is in Europe"}
> - Action at time t is a particular logic rule, e.g. a_t = { "if p -->
> q and q-->r , then p-->r"}
> - State transition: x_{t+1} = F(x_t, a_t) = if "I am in my place" and
> "my place is in Europe", then "I am in Europe"
> - Reward: something saying that this new state is desirable, makes
> sense, etc.
>
> Is this correct?
>
Yes, except that the reward may be zero, as when you're planning a sequence
of chess moves. Your reward only comes when you win / lose / draw the
game. If you "feel" that your chess moves make sense / are good, they are
your *internal* assessment of of the desirability / utility / value of
those actions, but that is not the external *reward*.
If I understand correctly, "desirability" is the utility value (V) or Q
value (which is V given a certain action). In physics it is called the
"action" with unit [energy x time] (not to be confused with action in
reinforcement learning).
V (utility value) is *learned*, R (reward) is *given* by the problem.
I am definitely lost with your comment about the Hamiltonian. I am familiar
> with optimal control theory, but I don't see the story... In general you
> don't need the velocity. What you need is an optimality condition, which
> doesn't have to be related to any time derivative. Think, e.g. about the
> Euler-Lagrange condition by deriving the reward function with respect to
> the current and future states and with respect to the action. It can be
> formulated even in discrete time.
>
In the *unconstrained* optimization problem setting, the action can
potentially move the state in *any* direction, thus the action is defined
as the same as the velocity. The problem with sparse reward is that we
have neither ∂L/∂ẋ nor ∂L/∂x (except as a delta function at the terminal
state).
------------------------------------------
Artificial General Intelligence List: AGI
Permalink:
https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-Mf2c0d49874614a5a584c299e
Delivery options: https://agi.topicbox.com/groups/agi/subscription