Re: [agi] My AGI 2019 paper draft

Yan King Yin, 甄景贤 Wed, 15 May 2019 21:46:13 -0700

On Wed, May 15, 2019 at 3:42 AM Sergio VM <[email protected]> wrote:


> Not sure if I am following you...
>
> In order to define the optimal control problem, you need:
>
>    - State set: Set of all possible logic propositions. OK
>    - Action set: Logic rules. It is not clear to me what this means. Can
>    you choose which logic rule to use with which proposition? I mean, the
>    actions should be chosen by the agent (there may be constraints on which
>    actions are available at each state, but there might be some freedom
>    nonetheless, otherwise, there wouldn't be anything to be learned).
>    - Expected reward function: some map from the state and action sets to
>    the reals. You want it to be non-smooth. OK.
>    - Transition kernel: represents the knowledge. Very interesting.
>
> So let me try to understand with an example:
>
>    - State at time t is a bunch of propositions', e.g. x_t = {"I am in my
>    place", "my place is in Europe"}
>    - Action at time t is a particular logic rule, e.g. a_t = { "if p -->
>    q and  q-->r , then p-->r"}
>    - State transition: x_{t+1} = F(x_t, a_t) = if "I am in my place" and
>    "my place is in Europe", then "I am in Europe"
>    - Reward: something saying that this new state is desirable, makes
>    sense, etc.
>
> Is this correct?
>

Yes, except that the reward may be zero, as when you're planning a sequence
of chess moves.  Your reward only comes when you win / lose / draw the
game.  If you "feel" that your chess moves make sense / are good, they are
your *internal* assessment of of the desirability / utility / value of
those actions, but that is not the external *reward*.

If I understand correctly, "desirability" is the utility value (V) or Q
value (which is V given a certain action).  In physics it is called the
"action" with unit [energy x time] (not to be confused with action in
reinforcement learning).

V (utility value) is *learned*, R (reward) is *given* by the problem.

I am definitely lost with your comment about the Hamiltonian. I am familiar
> with optimal control theory, but I don't see the story... In general you
> don't need the velocity. What you need is an optimality condition, which
> doesn't have to be related to any time derivative. Think, e.g. about the
> Euler-Lagrange condition by deriving the reward function with respect to
> the current and future states and with respect to the action. It can be
> formulated even in discrete time.
>


In the *unconstrained* optimization problem setting, the action can
potentially move the state in *any* direction, thus the action is defined
as the same as the velocity.  The problem with sparse reward is that we
have neither ∂L/∂ẋ nor ∂L/∂x (except as a delta function at the terminal
state).

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-Mf2c0d49874614a5a584c299e
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] My AGI 2019 paper draft

Reply via email to