Added: oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml URL: http://svn.apache.org/viewvc/oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml?rev=1052147&view=auto ============================================================================== --- oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml (added) +++ oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml Thu Dec 23 02:47:16 2010 @@ -0,0 +1,327 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Copyright (c) 2006 California Institute of Technology. + ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged. + + $Id$ +--> + +<document> + <properties> + <title>CAS Workflow Manager Technical Guide</title> + <author email="[email protected]">Brian Foster</author> + </properties> + + <body> + <section name="Introduction"> + <p>Historically data processing systems have been primarily controlled by file-based + triggering mechanisms. These types of systems function like a chain-reaction: one file would + trigger a process, which would generate another file, which would then trigger another + process, and so forth. These systems, while easy to add and remove processes from the + system, require the user to extensively understand how these processes are related to each + other, so to avoid creating unwanted 'chain-reactions'. Recently, efforts have been made to + move towards more controlled processing system models, which utilize the concept of + workflows. Workflows are more-or-less a tightly grouped set of processes. A workflow + explicitly tells the processing system which set of processes should be run and in what + order. Workflows run processes based off successful completion of previous processes in its + mapping, thereby making file generation a criteria for successful completion of a process + instead of being the triggering mechanism for the next process. This concept separates the + workflow from the files it may generate, thereby allowing the processing system to perform + more tasks than just file processing. In this paper you will learn how to use, configure, + and understand design decisions of a workflow processing system, specifically CAS-Workflow2. + </p> + </section> + <section name="Workflows Structure"> + <p>Workflows consist of three parts: pre-conditions, a list of tasks (or processes) to + perform, and post-conditions.</p> + <subsection name="Pre-Conditions"> + <p>A pre-condition is a task whose purpose is to return a true/false answer to some + question. Pre-conditions are requirements that must be meet before a workflow can run its + tasks. An example of a pre-condition might be: checking for the existences of a particular + file. After all pre-conditions have been meet, a workflow will execute its tasks.</p> + </subsection> + <subsection name="Tasks"> + <p>A Task is an activity or piece of work that needs to be done. Tasks are the atomic level + of a workflow. The goal of any workflow is to run its tasks to successful completion. An + example of a task might be: creating a visual map for a data file. After all tasks have + completed, the workflow will then run its post-conditions.</p> + </subsection> + <subsection name="Post-Conditions"> + <p>Post-conditions give the workflow the ability to evaluate whether or not its task + successfully perform all their required duties. An example of a post-condition might be: + checking for the existence of a file that a task was responsible for generating.</p> + </subsection> + </section> + <section name="Workflow Lifecycle"> + <p>Each workflow must go through a well-defined set of states or a lifecycle. We can easily + deduce a few of the states from what we know already. A workflow starts by evaluating its + pre-conditions, so we can call this state: PreConditionEval. Then it must execute its tasks, + we'll call this state: Executing. Then of course we have: PostConditionEval. Now, what if + any of the three steps fail, we need a failure state, so hence the state: Failure. And, if + everything goes as planed, we have the state: Success. Figure 1 further describes this + workflow lifecycle. There are other states, however, for simplicity sake, these are the only + states we will introduce for now, the other states will be introduced later, as more + workflow knowledge is required to understand them.</p> + <center> + <img src="../images/simplified-lifecycle.png" alt="Workflow Manager Lifecycle"/> + </center> + <subsection name="PreConditionEval"> + <p>Workflow is executing its pre-conditions.</p> + </subsection> + <subsection name="Executing"> + <p>Workflow is executing its tasks.</p> + </subsection> + <subsection name="PostConditionEval"> + <p>Workflow is executing its post-conditions.</p> + </subsection> + <subsection name="Success"> + <p>Workflow has successfully passed all pre-conditions, executed all tasks, and passed all + post-conditions.</p> + </subsection> + <subsection name="Failure"> + <p>At least one of the workflow's pre-conditions, tasks, or post-conditions have failed.</p> + </subsection> + </section> + <section name="Workflow Context"> + <p>Workflows can have context, which is kind of like their knowledge base. This context is + also referred to as metadata. Metadata is a bucket of key/value(s) information that + workflows have access to. An example of a metadata field might be: RunDate='2009-01-20'. At + times, tasks needs to talk to other tasks, or conditions would like to communicate something + to the tasks that run after them. Workflows not only control the flow of conditions and + tasks, they also control communication between them. Workflows accomplish this by the use of + metadata. Conditions and tasks can also have their own metadata, which they don't share with + anyone else. A workflow has three categories of metadata: 1) static, 2) dynamic, and 3) + local.</p> + <subsection name="Static"> + <p>This is metadata that is the same for every run of a workflow. A task can always assume + this metadata will exist.</p> + </subsection> + <subsection name="Dynamic"> + <p>This is metadata that is passed into the workflow when it is run and/or set by other task + and conditions when communicating with each other.</p> + </subsection> + <subsection name="Local"> + <p>This is dynamic metadata that is local to a task or condition.</p> + </subsection> + </section> + <section name="Everything is a Workflow"> + <p>In order to simplify how process control is configured, tasks and conditions were also + designed to be workflows. This means that almost anywhere we used the word workflow up until + now, we could have replaced it with the word task and vise versa. However, there are a few + exceptions, a task differs from a workflow in that it wraps an executable class, which + performs some activity, and it cannot have any children workflows. Conditions are just + specialized tasks, so the same applies to them as well. Yet, conditions differ from tasks in + that they cannot have pre-conditions or post-conditions, since that would mean you could + have a pre-condition for a pre-condition. So, in other words, a workflow is really just a + workflow of workflows with pre and post-condition workflows.</p> + </section> + <section name="Workflow Listeners"> + <p>We now know that workflows have three different parts (or buckets) into which other + workflows can be placed: pre-conditions, children workflows, and post-conditions. Workflows + placed into these buckets are treated like black boxes. A workflow has no idea what types of + workflows have been placed into these buckets. The workflow just knows that first the + workflows in the pre-conditions bucket must pass before running the workflows in the + children bucket, followed then by the workflows in the post-conditions bucket. The way a + workflow knows what is going on with the workflows in its buckets is by registering itself + as a listener for state changes in those workflows. When a workflow changes state, it will + notify its listeners about the change. The listening workflow will then adjust its state + depending on which bucket the state change notification came from. Earlier we learned about + the lifecycle which each workflow goes through. This lifecycle is not only followed by the + top workflow or root workflow, it is followed by every workflow in all of the different + buckets as well. Workflows will change states in their lifecycle when one of the workflows + in their buckets change state. For example, if that a workflow has a pre-condition workflow + which changes state to Executing, upon notification, it will change its state to + PreConditionEval. This notion of workflow lifecycle changes affecting other workflow + lifecycles will be explained in greater detail later.</p> + </section> + <section name="Workflow Types"> + <p>There are two categories to workflows, there are workflows which control the run order of + other workflows, and then there are workflows which track the execution of some process or + activity. There are currently two workflows implemented which control run order of + workflows:</p> + <subsection name="Parallel"> + <p>A workflow that runs all the workflows in its children bucket at the same time. Its + metadata (or context) becomes the merge of all metadata of workflows in its children + bucket.</p> + </subsection> + <subsection name="Sequential"> + <p>A workflow that runs the workflows in its children bucket one at a time, only running the + next child workflow after its previous child workflow has finished. Its metadata (or + context) is updated after each workflow from its children bucket is run, then passed to + the next workflow to run from its children bucket.</p> + </subsection> + <p>The second category of workflows, which track the running of some process, we have already + been introduced to, these are tasks and conditions:</p> + <subsection name="Task"> + <p>Tracks some executing activity. Its metadata is synched with this process + periodically.</p> + </subsection> + <subsection name="Condition"> + <p>Tracks some executing condition activity. Its metadata is synched with this executing + condition periodically.</p> + </subsection> + </section> + <section name="Workflows in Workflows"> + <p>Now that we understand the make up of a workflow, let look at an example. Let's say we want + a workflow that models going to the store to buy groceries. So the first step is to make + sure we have our keys and wallet. These would be considered pre-conditions, because we can't + drive without our keys, and we can't buy the groceries without our wallet. However, these + pre-conditions can be performed at the same time. I can check if I have my keys while I am + checking for my wallet, since checking for my keys does not depend on me checking for my + wallet. So these pre-conditions would happen in 'parallel'. After we've determined that we + have our keys and wallet, we can now perform the tasks we have set out to do: drive to the + store; buy our groceries; drive home. Since we can't do one of these tasks without doing the + one before it (that is, we can't buy our groceries without driving to the store), these + tasks are 'sequential'. So our workflow model graph would look something like:</p> + <pre> + [id='BuyGroceries' execution='sequential'] + {PreCond: [id='FindWalletAndKeys' execution='parallel'] + [id='FindWallet' exectuion='condition'] + [id='FindKeys' execution='condition']} + [id='DriveToStore' execution='task'] + [id='PurchaseGroceries' execution='task'] + [id='DriveHome' execution='task'] + </pre> + <p>Let's take this one step further now. Let's say we brought a friend along to help with the + shopping and we split up our list, so to cut the time in half. Now we have two people + shopping at the same time:</p> + <pre> + [id='BuyGroceries' execution='sequential'] + {PreCond: [id='FindWalletAndKeys' execution='parallel'] + [id='FindWallet' exectuion='condition'] + [id='FindKeys' execution='condition']} + [id='DriveToStore' execution='task'] + <strong>[id='PurchaseGroceries' execution='parallel'] + [id='YouPurchaseGroceries' execution='task'] + [id='FriendPurchaseGroceries' execution='task']</strong> + [id='DriveHome' execution='task'] + </pre> + <p>Figure 2 shows the task mapping of this workflow. Usually, when you go to implement a + workflow in the system, you will have a task diagram, which you will have to convert to a + workflow model graph similar to the grocery store example above. So being able to look at + one and realize the other is essential.</p> + <center> + <img src="../images/grocery-store-workflow-1.png" alt="Grocery Store Workflow 1"/> + </center> + <p>The following figures enumerates the recommended thought process which one should follow to + identify workflows from a task graph:</p> + <center> + <img src="../images/grocery-store-workflow-2.png" alt="Grocery Store Workflow 2"/> + </center> + <center> + <img src="../images/grocery-store-workflow-3.png" alt="Grocery Store Workflow 3"/> + </center> + <center> + <img src="../images/grocery-store-workflow-4.png" alt="Grocery Store Workflow 4"/> + </center> + </section> + <section name="Workflow Patterns"> + <p>There are many complex workflow patterns out there. However, most patterns should be + implementable with careful usage of different combinations of parallel and sequential + workflows. In the unusual case where parallel and sequential won't cut it, custom workflows + can be written and plugged in (this is an advanced topic that will be discussed later). Here + we will cover how to create the most common workflow patterns. More advanced patterns will + be discussed later.</p> + <subsection name="Parallel Split"> + <subsection name="- Description:"> + <p>The divergence of a branch into two or more parallel branches each of which execute + concurrently.</p> + </subsection> + <subsection name="- Diagram:"> + <center> + <img src="../images/parallel-split-diagram.png" alt="Parallel Split Diagram"/> + </center> + </subsection> + <subsection name="- Model Graph:"> + <pre> + [id='S1' execution='sequential'] + [id='T1' execution='task'] + [id='P1' execution='parallel'] + [id='T2' execution='task'] + [id='T3' execution='task'] + </pre> + </subsection> + </subsection> + <subsection name="Synchronization"> + <subsection name="- Description:"> + <p>The convergence of two or more branches into a single subsequent branch such that the + thread of control is passed to the subsequent branch when all input branches have been + enabled.</p> + </subsection> + <subsection name="- Diagram:"> + <center> + <img src="../images/synchronization-diagram.png" alt="Synchronization Diagram"/> + </center> + </subsection> + <subsection name="- Model Graph:"> + <pre> + [id='S1' execution='sequential'] + [id='P1' execution='parallel'] + [id='T1' execution='task'] + [id='T2' execution='task'] + [id='T3' execution='task'] + </pre> + </subsection> + </subsection> + <subsection name="Combination of a Parallel Split into a Synchronization"> + <subsection name="- Description:"> + <p>(See <strong>Parallel Split</strong> and <strong>Synchronization</strong>)</p> + </subsection> + <subsection name="- Diagram:"> + <center> + <img src="../images/parallel-split-into-synchronization-diagram.png" + alt="Combination of a Parallel Split into a Synchronization Diagram"/> + </center> + </subsection> + <subsection name="- Model Graph:"> + <pre> + [id='S1' execution='sequential'] + [id='T1' execution='task'] + [id='P1' execution='parallel'] + [id='T2' execution='task'] + [id='T3' execution='task'] + [id='T4' execution='task'] + </pre> + </subsection> + </subsection> + </section> + <section name="Lifecycles in Lifecycles"> + <p>We learned above how each workflow goes through its own lifecycle, which depends on is + pre-condition, children, and post-conditions workflowsâ lifecycles. Here we will learn how + this actually works. First we are going to introduce a few more states: Queued, + PreConditionSuccess, WaitingOnResources, and ExecutionComplete. Figure 9 is an updated + lifecycle diagram.</p> + <center> + <img src="../images/almost-complete-lifecycle.png" alt="Almost Complete Lifecycle Diagram"/> + </center> + <subsection name="Queued"> + <p>Workflow has been put on the main queue (assume this to be initial state for now).</p> + </subsection> + <subsection name="PreConditionSuccess"> + <p>Workflow has been put on the main queue (assume this to be initial state for now).</p> + </subsection> + <subsection name="WaitingOnResources"> + <p>Workflow (or its pre-condition, children, post-condition workflows) are ready to run but + canât because of resources.</p> + </subsection> + <subsection name="ExecutionComplete"> + <p>A workflow has completed executing or all workflows in its children bucket have completed + successfully.</p> + </subsection> + <p>Letâs bring back the buying groceries example but this time we will add in the states (with everything starting in Queued state):</p> + <pre> + [id=âBuyGroceriesâ execution=âsequentialâ state=âQueuedâ] + {PreCond: + [id=âFindWalletAndKeysâ execution=âparallel state=âQueuedââ] + [id=âFindWalletâ exectuion=âconditionâ state=âQueuedâ] + [id=âFindKeysâ execution=âconditionâ state=âQueuedâ]} + [id=âDriveToStoreâ execution=âtaskâ state=âQueuedâ] + [id=â PurchaseGroceriesâ execution=âparallelâ state=âQueuedâ] + [id=âYouPurchaseGroceriesâ execution=âtaskâ state=âQueuedâ] + [id=âFriendPurchaseGroceriesâ execution=âtaskâ state=âQueuedâ] + [id=âDriveHomeâ execution=âtaskâ state=âQueuedâ] + </pre> + </section> + </body> + +</document> \ No newline at end of file
