In order to understand the problem well, suppose that everything is known,
reward, state and action space, transitions. Why do you need to formulate
Hamitonian formulation? I am missing this point.
Do you need an optimality condition? If so, since you are working on
discrete state-action sets, the dynamic programming (Bellman) equation
gives you the optimality condition. On the other hand, if your state-action
sets were continuous, then you could use Euler-Lagrange, Pontryagin, or
HJB. But why do you need such optimality condition?

On Thu, May 16, 2019 at 5:45 AM YKY (Yan King Yin, 甄景贤) <
[email protected]> wrote:

> On Wed, May 15, 2019 at 3:42 AM Sergio VM <[email protected]> wrote:
>
>> Not sure if I am following you...
>>
>> In order to define the optimal control problem, you need:
>>
>>    - State set: Set of all possible logic propositions. OK
>>    - Action set: Logic rules. It is not clear to me what this means. Can
>>    you choose which logic rule to use with which proposition? I mean, the
>>    actions should be chosen by the agent (there may be constraints on which
>>    actions are available at each state, but there might be some freedom
>>    nonetheless, otherwise, there wouldn't be anything to be learned).
>>    - Expected reward function: some map from the state and action sets
>>    to the reals. You want it to be non-smooth. OK.
>>    - Transition kernel: represents the knowledge. Very interesting.
>>
>> So let me try to understand with an example:
>>
>>    - State at time t is a bunch of propositions', e.g. x_t = {"I am in
>>    my place", "my place is in Europe"}
>>    - Action at time t is a particular logic rule, e.g. a_t = { "if p -->
>>    q and  q-->r , then p-->r"}
>>    - State transition: x_{t+1} = F(x_t, a_t) = if "I am in my place" and
>>    "my place is in Europe", then "I am in Europe"
>>    - Reward: something saying that this new state is desirable, makes
>>    sense, etc.
>>
>> Is this correct?
>>
>
> Yes, except that the reward may be zero, as when you're planning a
> sequence of chess moves.  Your reward only comes when you win / lose / draw
> the game.  If you "feel" that your chess moves make sense / are good, they
> are your *internal* assessment of of the desirability / utility / value
> of those actions, but that is not the external *reward*.
>
> If I understand correctly, "desirability" is the utility value (V) or Q
> value (which is V given a certain action).  In physics it is called the
> "action" with unit [energy x time] (not to be confused with action in
> reinforcement learning).
>
> V (utility value) is *learned*, R (reward) is *given* by the problem.
>
> I am definitely lost with your comment about the Hamiltonian. I am
>> familiar with optimal control theory, but I don't see the story... In
>> general you don't need the velocity. What you need is an optimality
>> condition, which doesn't have to be related to any time derivative. Think,
>> e.g. about the Euler-Lagrange condition by deriving the reward function
>> with respect to the current and future states and with respect to the
>> action. It can be formulated even in discrete time.
>>
>
>
> In the *unconstrained* optimization problem setting, the action can
> potentially move the state in *any* direction, thus the action is defined
> as the same as the velocity.  The problem with sparse reward is that we
> have neither ∂L/∂ẋ nor ∂L/∂x (except as a delta function at the terminal
> state).
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-Mf2c0d49874614a5a584c299e>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-M7373519b8d1c42f31d9033e3
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to