In order to understand the problem well, suppose that everything is known, reward, state and action space, transitions. Why do you need to formulate Hamitonian formulation? I am missing this point. Do you need an optimality condition? If so, since you are working on discrete state-action sets, the dynamic programming (Bellman) equation gives you the optimality condition. On the other hand, if your state-action sets were continuous, then you could use Euler-Lagrange, Pontryagin, or HJB. But why do you need such optimality condition?
On Thu, May 16, 2019 at 5:45 AM YKY (Yan King Yin, 甄景贤) < [email protected]> wrote: > On Wed, May 15, 2019 at 3:42 AM Sergio VM <[email protected]> wrote: > >> Not sure if I am following you... >> >> In order to define the optimal control problem, you need: >> >> - State set: Set of all possible logic propositions. OK >> - Action set: Logic rules. It is not clear to me what this means. Can >> you choose which logic rule to use with which proposition? I mean, the >> actions should be chosen by the agent (there may be constraints on which >> actions are available at each state, but there might be some freedom >> nonetheless, otherwise, there wouldn't be anything to be learned). >> - Expected reward function: some map from the state and action sets >> to the reals. You want it to be non-smooth. OK. >> - Transition kernel: represents the knowledge. Very interesting. >> >> So let me try to understand with an example: >> >> - State at time t is a bunch of propositions', e.g. x_t = {"I am in >> my place", "my place is in Europe"} >> - Action at time t is a particular logic rule, e.g. a_t = { "if p --> >> q and q-->r , then p-->r"} >> - State transition: x_{t+1} = F(x_t, a_t) = if "I am in my place" and >> "my place is in Europe", then "I am in Europe" >> - Reward: something saying that this new state is desirable, makes >> sense, etc. >> >> Is this correct? >> > > Yes, except that the reward may be zero, as when you're planning a > sequence of chess moves. Your reward only comes when you win / lose / draw > the game. If you "feel" that your chess moves make sense / are good, they > are your *internal* assessment of of the desirability / utility / value > of those actions, but that is not the external *reward*. > > If I understand correctly, "desirability" is the utility value (V) or Q > value (which is V given a certain action). In physics it is called the > "action" with unit [energy x time] (not to be confused with action in > reinforcement learning). > > V (utility value) is *learned*, R (reward) is *given* by the problem. > > I am definitely lost with your comment about the Hamiltonian. I am >> familiar with optimal control theory, but I don't see the story... In >> general you don't need the velocity. What you need is an optimality >> condition, which doesn't have to be related to any time derivative. Think, >> e.g. about the Euler-Lagrange condition by deriving the reward function >> with respect to the current and future states and with respect to the >> action. It can be formulated even in discrete time. >> > > > In the *unconstrained* optimization problem setting, the action can > potentially move the state in *any* direction, thus the action is defined > as the same as the velocity. The problem with sparse reward is that we > have neither ∂L/∂ẋ nor ∂L/∂x (except as a delta function at the terminal > state). > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + delivery > options <https://agi.topicbox.com/groups/agi/subscription> Permalink > <https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-Mf2c0d49874614a5a584c299e> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T3cad55ae5144b323-M7373519b8d1c42f31d9033e3 Delivery options: https://agi.topicbox.com/groups/agi/subscription
