RE: [agi] Plans vs. Policies

Piaget Modeler via AGI Mon, 16 Mar 2015 12:29:00 -0700

This just in...

Frederic Py, Software engineer in robotics and AI1 upvote by Michael Miller.I 
have been A2A and am actually involved on some of aspect of it as I work in 
situated planning and execution. I can provide also the disclaimer that my work 
focuses on in-situ planning and re-planning using linear plans as opposed to 
policies, that way you are aware of my potential biases.For people not aware 
the difference between a (linear) plan and a policy can be resumed as follow. A 
plan will be a a strictly defined sequence of actions leading from the initial 
state to the goal (well it can be more complex than that if you have 
concurrency but this is still the basic idea); A policy on the other hand will 
be defined by a set of pair "state -> action" which should allow from any 
reachable state to eventually attain the given goal. As a result a plan is a 
predetermined set of actions that need to be followed by the letter until we 
either reach its end or something went bad (in which case you need to re-plan), 
the policy is kind of a representation of all the plans possibles in between to 
reach this same goal hence no need to re-plan as a failure is probably just 
leading you to a state for which you have already the next pair "state -> 
action" as a way to reach your goal. Presented like this you probably start to 
think "well policies are far more powerful" and you would be true at many 
level. When found, a policy is just providing you a strategy to reach your goal 
no matter what and even better, an ideal policy will also allow you to avoid to 
end-up in a dead-end state when possible or at least the policy generator 
should be able to warn you that there's a possible sequence of events that 
would lead you to such dead-end where your goal is impossible to reach (or for 
stochastic planning at least decide always to take the action that minimize the 
chance to reach such situation). Linear plans tend to be "optimistic" on the 
outcome of action (i.e. they assume the outcome is deterministic) and for this 
reason they cannot anticipate that an unexpected situation can lead to such 
dead-end and this even with all the re-planning you'd want.The core of the 
issue here though is the "when found"; planning is a very complex problem that 
range from NP-hard to undecidable in term of complexity and here I am talking 
only about the classical form of linear plans. A policy being "all the plan" 
you can imagine quickly that finding this policy will probably not be easier 
than finding only one sequence in the same  state space. Many policy generation 
are  polynomial in relation to the size of the state space but the size of the 
state space grows exponentially with the number of state variables which often 
limits it to a very small/abstract model  of the world.  And even with 
relatively small model finding this policy   in a complete manner is time 
consuming enough that it prohibits this to be done on-line -- refining a policy 
iteratively like it is done on reinforcement learning is something that is more 
tractable though as you probably know.Neither are better or worse then the 
other  and are just better fit for different situations. 4 informations come 
into play:do you know what are your goals a priori ?do you know your initial 
state (or current state) a priori ?do your actions have multiple outcomes and 
how critical it is to handle these a-prioriare your time and/or processing 
constrained to find your solution ?Another one that relates slightly to 3. is 
the need to initial readability of your plan vs policy. A sequence of actions 
is far easier for human to read than a table with plenty of  "state -> action". 
A good illustration of it is the choice made on the Mars Exploration Rovers 
with Mapgen. Mapgen is used on earth to generate the plan of the rovers. The 
rovers do not plan in-situ. Instead a plan is generated through mixed-imitative 
with the input of the scientists and engineers (using the Europa planner). 
after validation this plan is sent to the vehicle and executed. The plan is not 
a strategy and is linear as it is easier to interpret, if it fails the vehicle 
just stop its execution and a day is "wasted" but it is a fair trade-off to 
gain more predictable behavior (the scientist knows exactly what the rover is 
supposed to do on a given day). If you know your goals a-priori but either do 
not know where you will start or how the actions would evolve and still want to 
"plan" for it only once. Then policy generation is probably the way to go. You 
generate your policy, put it on the vehicle and the vehicle then just need to 
execute it and potentially do slight refinement as it goes (these refinement do 
not change the strategy structure but do more subtle alterations such as 
balancing some weights, ...). If you know both your goals, initial state and 
consider that your action can be considered as deterministic (for example their 
failure is very rare or the cost of the failure is not important) then a linear 
plan will provide you a solution faster and you can use this to work in detail 
on how to find the "best" solution given the optimization of a given utility 
function (for example reduce battery usage).If you do not know the goals to be 
satisfied a-priori and just know that people will ask to your system to do 
different things as they see fit, then you are in a very different problem. As 
I mentioned planning is a complex task and policy generation is just pushing 
this complexity further. This is the reason that so far most of the work on 
situated planning instead relies on: "produce linear plan", "check at 
execution", "re-plan if something went wrong or the goal changed". That way, 
the agent can reevaluate its plan when it sees fit on its own agenda. And to do 
so -- as the agent is often a robot with limited time and resources -- it needs 
to be able to produce the new plan as fast as possible. Policy generation would 
be far too closetful as it would require to regenerate a new policy whenever 
the goal set change; linear plan on the other hand  -- especially when getting 
way from generic approaches and use for example HTN that give better 
information on the plan structure (at the cost of a more involved modeling) -- 
can be made efficient enough to close the loop in a reasonable time. It still 
remains a fairly high-level plan that is refined down by the executive and 
often people do not try to optimize the plan and are happy with the first 
solution that comes out, but all of this can be done in a relative short time 
in very modest computing devices. By doing so the vehicle can reevaluate its 
plan whenever things go wrong without the need of external intervention (which 
also mean that it is harder to predict what will happen including whether a 
goal would be fulfilled or not as it can be rejected at any time) and it can 
work as long as there's no dead-end situation or that the model is made in such 
a way that those dead-ends are avoided (this is a tricky part but also I have 
rarely seen a dead-end situation occurring at the level of abstraction these 
systems are).As you can see all of the above are more or less trade-off between 
what you know a priori and/or consider can be ignored. They are all very 
complex tasks which provide a solution relatively slowly with fast complexity 
growth with the domain level of detail. As "AI planning" goal is to generate a 
plan/strategy this complexity is a concern. The concern is either due to the 
need to provide a solution in a reasonable time or further to provide the 
optimal solution which then involve further search and more complexity. Hence 
the format of a linear plan present the advantage to give a solution relatively 
fast for relatively complex domains. Policy generation on the other hand is 
very taxing even though the solution should be more flexible at the end. Still 
it is hard to justify unless you know that your action outcome will frequently 
fail or a failure could lead you  to a very bad situation. For reinforcement 
learning there are 2 aspect that make policies learning more prevalent: 1) you 
know your goal -- in fact your goal is the utility function that will direct 
your learning, 2) you know that actions will "fail" in a sense this is by 
design  as learning would not occur without  failure, 3) you do not know in 
what sate you will be initially. In a sense you try to learn on your own the 
effect of your actions (or at least how these effect contribute to your goal) 
which on its own is a core difference from planning where the effect of action 
-- deterministic or not -- is known and the purpose is to search how to connect 
these actions to lead to your goal. In sense reinforcement learning purpose is 
more refining your policy as you refine your model which mean that your search 
is made trough execution hence its cost is factored in the "learning phase", 
planning on the other hand is done ahead of execution (even when done in-situ 
it is actually a strong simplification assumption required for decidability) 
which mean that any time spent in searching for the solution is time not doing 
anything. That explains why you perceive this contrast between planning and 
your community.
From: [email protected]
To: [email protected]
Subject: FW: [agi] Plans vs. Policies
Date: Sun, 15 Mar 2015 19:27:17 -0700





Thanks.  Is there a reason behind it, or not? 
The reason I ask is because I'm comparing reinforcement learning systems to 
traditional AI planning systems.Trying to see how the accomplish the same job, 
of action selection, and what the real tradeoffs are.  My initial guess is that 
RL systems may suffer from a lack of representational Flexibility available in 
AI planning systems, while traditional AI planning systems are not as fast as 
RL systems.  The lack of Representational flexibility means that RL systems 
cannot take advantage of problem space abstractions and other techniques of AI 
planners.  In theend they may be less efficient, and they may or may not scale. 
 Combining these two approaches appears to be agood idea which I suspect is 
what the dynamic adaptive planning paradigm must be about.  I'll dig into it 
further. 

Other thoughts are appreciated...


Cheers,
~PM
Date: Mon, 16 Mar 2015 10:16:35 +0800
Subject: Re: [agi] Plans vs. Policies
From: [email protected]
To: [email protected]; [email protected]


Either case is possible ;)

On Mon, Mar 16, 2015 at 10:15 AM, Piaget Modeler via AGI <[email protected]> 
wrote:







Are OpenCog's policies distinct structures from its plans or are they the same 
structure? Also,  are the plans single action, as in Reinforcement Learning, or 
multi-action as in AI Planning? 
Kindly advise.

~PM

Date: Mon, 16 Mar 2015 10:08:38 +0800
Subject: Re: [agi] Plans vs. Policies
From: [email protected]
To: [email protected]


OpenCog uses policies to drive the creation of dynamic plans ;)

On Mon, Mar 16, 2015 at 10:07 AM, Piaget Modeler via AGI <[email protected]> 
wrote:



Ben,
What does OpenCog use? Plans or policies?  Why? 
~PM
Date: Mon, 16 Mar 2015 10:01:31 +0800
Subject: Re: [agi] Plans vs. Policies
From: [email protected]
To: [email protected]


A traditional plan in the AI planning literature sense does not depend on 
future observations, but there is now a big literature on dynamic/adaptive 
planning algorithms as well...

ben

 
From: [email protected]
To: [email protected]
Subject: [agi] Plans vs. Policies
Date: Sun, 15 Mar 2015 16:22:46 -0700




Reinforcement Learning uses "policies" to select actions while most work in AI 
Planning emphasizes the construction and representation of a "plan" which 
consists of a sequence of actions (or a hierarchyof composite and primitive 
actions).  Kindly compare, contrast, evaluate trade-offs, and recommend the 
plans or policies approach  
Your rationale is appreciated.
~PM
-- 
Ben Goertzel, PhD
http://goertzel.org

"The reasonable man adapts himself to the world: 
the unreasonable one persists in trying to adapt the world to himself. 
Therefore all progress depends on the unreasonable man." -- George Bernard Shaw





  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  

                                          


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  





-- 
Ben Goertzel, PhD
http://goertzel.org

"The reasonable man adapts himself to the world: 
the unreasonable one persists in trying to adapt the world to himself. 
Therefore all progress depends on the unreasonable man." -- George Bernard Shaw





  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  

                                          


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  

                                          


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  





-- 
Ben Goertzel, PhD
http://goertzel.org

"The reasonable man adapts himself to the world: 
the unreasonable one persists in trying to adapt the world to himself. 
Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

                                                                                
                                          


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

RE: [agi] Plans vs. Policies

Reply via email to