This just in... Frederic Py, Software engineer in robotics and AI1 upvote by Michael Miller.I have been A2A and am actually involved on some of aspect of it as I work in situated planning and execution. I can provide also the disclaimer that my work focuses on in-situ planning and re-planning using linear plans as opposed to policies, that way you are aware of my potential biases.For people not aware the difference between a (linear) plan and a policy can be resumed as follow. A plan will be a a strictly defined sequence of actions leading from the initial state to the goal (well it can be more complex than that if you have concurrency but this is still the basic idea); A policy on the other hand will be defined by a set of pair "state -> action" which should allow from any reachable state to eventually attain the given goal. As a result a plan is a predetermined set of actions that need to be followed by the letter until we either reach its end or something went bad (in which case you need to re-plan), the policy is kind of a representation of all the plans possibles in between to reach this same goal hence no need to re-plan as a failure is probably just leading you to a state for which you have already the next pair "state -> action" as a way to reach your goal. Presented like this you probably start to think "well policies are far more powerful" and you would be true at many level. When found, a policy is just providing you a strategy to reach your goal no matter what and even better, an ideal policy will also allow you to avoid to end-up in a dead-end state when possible or at least the policy generator should be able to warn you that there's a possible sequence of events that would lead you to such dead-end where your goal is impossible to reach (or for stochastic planning at least decide always to take the action that minimize the chance to reach such situation). Linear plans tend to be "optimistic" on the outcome of action (i.e. they assume the outcome is deterministic) and for this reason they cannot anticipate that an unexpected situation can lead to such dead-end and this even with all the re-planning you'd want.The core of the issue here though is the "when found"; planning is a very complex problem that range from NP-hard to undecidable in term of complexity and here I am talking only about the classical form of linear plans. A policy being "all the plan" you can imagine quickly that finding this policy will probably not be easier than finding only one sequence in the same state space. Many policy generation are polynomial in relation to the size of the state space but the size of the state space grows exponentially with the number of state variables which often limits it to a very small/abstract model of the world. And even with relatively small model finding this policy in a complete manner is time consuming enough that it prohibits this to be done on-line -- refining a policy iteratively like it is done on reinforcement learning is something that is more tractable though as you probably know.Neither are better or worse then the other and are just better fit for different situations. 4 informations come into play:do you know what are your goals a priori ?do you know your initial state (or current state) a priori ?do your actions have multiple outcomes and how critical it is to handle these a-prioriare your time and/or processing constrained to find your solution ?Another one that relates slightly to 3. is the need to initial readability of your plan vs policy. A sequence of actions is far easier for human to read than a table with plenty of "state -> action". A good illustration of it is the choice made on the Mars Exploration Rovers with Mapgen. Mapgen is used on earth to generate the plan of the rovers. The rovers do not plan in-situ. Instead a plan is generated through mixed-imitative with the input of the scientists and engineers (using the Europa planner). after validation this plan is sent to the vehicle and executed. The plan is not a strategy and is linear as it is easier to interpret, if it fails the vehicle just stop its execution and a day is "wasted" but it is a fair trade-off to gain more predictable behavior (the scientist knows exactly what the rover is supposed to do on a given day). If you know your goals a-priori but either do not know where you will start or how the actions would evolve and still want to "plan" for it only once. Then policy generation is probably the way to go. You generate your policy, put it on the vehicle and the vehicle then just need to execute it and potentially do slight refinement as it goes (these refinement do not change the strategy structure but do more subtle alterations such as balancing some weights, ...). If you know both your goals, initial state and consider that your action can be considered as deterministic (for example their failure is very rare or the cost of the failure is not important) then a linear plan will provide you a solution faster and you can use this to work in detail on how to find the "best" solution given the optimization of a given utility function (for example reduce battery usage).If you do not know the goals to be satisfied a-priori and just know that people will ask to your system to do different things as they see fit, then you are in a very different problem. As I mentioned planning is a complex task and policy generation is just pushing this complexity further. This is the reason that so far most of the work on situated planning instead relies on: "produce linear plan", "check at execution", "re-plan if something went wrong or the goal changed". That way, the agent can reevaluate its plan when it sees fit on its own agenda. And to do so -- as the agent is often a robot with limited time and resources -- it needs to be able to produce the new plan as fast as possible. Policy generation would be far too closetful as it would require to regenerate a new policy whenever the goal set change; linear plan on the other hand -- especially when getting way from generic approaches and use for example HTN that give better information on the plan structure (at the cost of a more involved modeling) -- can be made efficient enough to close the loop in a reasonable time. It still remains a fairly high-level plan that is refined down by the executive and often people do not try to optimize the plan and are happy with the first solution that comes out, but all of this can be done in a relative short time in very modest computing devices. By doing so the vehicle can reevaluate its plan whenever things go wrong without the need of external intervention (which also mean that it is harder to predict what will happen including whether a goal would be fulfilled or not as it can be rejected at any time) and it can work as long as there's no dead-end situation or that the model is made in such a way that those dead-ends are avoided (this is a tricky part but also I have rarely seen a dead-end situation occurring at the level of abstraction these systems are).As you can see all of the above are more or less trade-off between what you know a priori and/or consider can be ignored. They are all very complex tasks which provide a solution relatively slowly with fast complexity growth with the domain level of detail. As "AI planning" goal is to generate a plan/strategy this complexity is a concern. The concern is either due to the need to provide a solution in a reasonable time or further to provide the optimal solution which then involve further search and more complexity. Hence the format of a linear plan present the advantage to give a solution relatively fast for relatively complex domains. Policy generation on the other hand is very taxing even though the solution should be more flexible at the end. Still it is hard to justify unless you know that your action outcome will frequently fail or a failure could lead you to a very bad situation. For reinforcement learning there are 2 aspect that make policies learning more prevalent: 1) you know your goal -- in fact your goal is the utility function that will direct your learning, 2) you know that actions will "fail" in a sense this is by design as learning would not occur without failure, 3) you do not know in what sate you will be initially. In a sense you try to learn on your own the effect of your actions (or at least how these effect contribute to your goal) which on its own is a core difference from planning where the effect of action -- deterministic or not -- is known and the purpose is to search how to connect these actions to lead to your goal. In sense reinforcement learning purpose is more refining your policy as you refine your model which mean that your search is made trough execution hence its cost is factored in the "learning phase", planning on the other hand is done ahead of execution (even when done in-situ it is actually a strong simplification assumption required for decidability) which mean that any time spent in searching for the solution is time not doing anything. That explains why you perceive this contrast between planning and your community. From: [email protected] To: [email protected] Subject: FW: [agi] Plans vs. Policies Date: Sun, 15 Mar 2015 19:27:17 -0700
Thanks. Is there a reason behind it, or not? The reason I ask is because I'm comparing reinforcement learning systems to traditional AI planning systems.Trying to see how the accomplish the same job, of action selection, and what the real tradeoffs are. My initial guess is that RL systems may suffer from a lack of representational Flexibility available in AI planning systems, while traditional AI planning systems are not as fast as RL systems. The lack of Representational flexibility means that RL systems cannot take advantage of problem space abstractions and other techniques of AI planners. In theend they may be less efficient, and they may or may not scale. Combining these two approaches appears to be agood idea which I suspect is what the dynamic adaptive planning paradigm must be about. I'll dig into it further. Other thoughts are appreciated... Cheers, ~PM Date: Mon, 16 Mar 2015 10:16:35 +0800 Subject: Re: [agi] Plans vs. Policies From: [email protected] To: [email protected]; [email protected] Either case is possible ;) On Mon, Mar 16, 2015 at 10:15 AM, Piaget Modeler via AGI <[email protected]> wrote: Are OpenCog's policies distinct structures from its plans or are they the same structure? Also, are the plans single action, as in Reinforcement Learning, or multi-action as in AI Planning? Kindly advise. ~PM Date: Mon, 16 Mar 2015 10:08:38 +0800 Subject: Re: [agi] Plans vs. Policies From: [email protected] To: [email protected] OpenCog uses policies to drive the creation of dynamic plans ;) On Mon, Mar 16, 2015 at 10:07 AM, Piaget Modeler via AGI <[email protected]> wrote: Ben, What does OpenCog use? Plans or policies? Why? ~PM Date: Mon, 16 Mar 2015 10:01:31 +0800 Subject: Re: [agi] Plans vs. Policies From: [email protected] To: [email protected] A traditional plan in the AI planning literature sense does not depend on future observations, but there is now a big literature on dynamic/adaptive planning algorithms as well... ben From: [email protected] To: [email protected] Subject: [agi] Plans vs. Policies Date: Sun, 15 Mar 2015 16:22:46 -0700 Reinforcement Learning uses "policies" to select actions while most work in AI Planning emphasizes the construction and representation of a "plan" which consists of a sequence of actions (or a hierarchyof composite and primitive actions). Kindly compare, contrast, evaluate trade-offs, and recommend the plans or policies approach Your rationale is appreciated. ~PM -- Ben Goertzel, PhD http://goertzel.org "The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw AGI | Archives | Modify Your Subscription AGI | Archives | Modify Your Subscription -- Ben Goertzel, PhD http://goertzel.org "The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw AGI | Archives | Modify Your Subscription AGI | Archives | Modify Your Subscription AGI | Archives | Modify Your Subscription -- Ben Goertzel, PhD http://goertzel.org "The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
