Is there a chance to include workflow restarter (where it was stopped earlier) in the tasks.
Thanks, Sudhakar. On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne <yasasgunarat...@gmail.com<mailto:yasasgunarat...@gmail.com>> wrote: Hi Suresh and Dimuthu, Thank you very much for the clarifications and suggestions. Based on them and other Helix related factors encountered during the implementation process, I updated and simplified the structure of workflow execution framework. 1. Airavata Workflow Manager Airavata Workflow Manager is responsible for accepting the workflow information provided by the user, creating a Helix workflow with task dependencies, and submitting it for execution. 2. Airavata Workflow Data Blocks Airavata Workflow Data Blocks are saved in JSON format as user contents in Helix workflow scope. These blocks contain the links of input data of the user, replica catalog entries of output data, and other information that are required for the workflow execution. 3. Airavata Workflow Tasks 3.1. Operator Tasks i. Flow Starter Task Flow Starter Task is responsible for starting a specific branch of the Airavata Workflow. In a single Airavata Workflow there can be multiple starting points. ii. Flow Terminator Task Flow Terminator Task is responsible for terminating a specific branch of the Airavata workflow. In a single workflow there can be multiple terminating points. iii. Flow Barrier Task Flow Barrier Task works as a waiting component at a middle of a workflow. For example if there are two experiments running and the results of both experiments are required to continue the workflow, barrier waits for both experiments to be completed before continuing. iv. Flow Divider Task Flow Divider Task opens up new branches of the workflow. v. Condition Handler Task Condition Handler Task is the path selection component of the workflow. 3.2. Processor Tasks These components are responsible for triggering Orchestrator to perform specific processes (ex: experiments / data processing activities). 3.3. Loop Tasks i. Foreach Loop Task ii. Do While Loop Task Regards On Mon, May 21, 2018 at 4:01 PM Suresh Marru <sma...@apache.org<mailto:sma...@apache.org>> wrote: Hi Yasas, This is good detail, I haven’t digested it all, but a quick feedback. Instead of connecting multiple experiments within a workflow which will be confusing from a user point of view, can you use the following terminology: * A computational experiment may have a single application execution or multiple (workflow). ** So an experiment may correspond to a single application execution, multiple application execution or even multiple workflows nested amongst them (hierarchal workflows) To avoid any confusion, lets call these units of execution as a Process. A process is an abstract notion for a unit of execution without going into implementing details and it described the inputs and outputs. For an experiment with single application, experiment and process have one on one correspondence, but within a workflow, each step is a Process. Tasks are the implementation detail of a Process. So the change in your Architecture will be to chain multiple processes together within an experiment and not to chain multiple experiments. Does it make sense? You can also refer to the attached figure which illustrate these from a data model perspective. Suresh P.S. Over all, great going in mailing list communications, keep’em coming. On May 21, 2018, at 1:25 AM, Yasas Gunarathne <yasasgunarat...@gmail.com<mailto:yasasgunarat...@gmail.com>> wrote: Hi Upeksha, Thank you for the information. I have identified the components that needs to be included in the workflow execution framework. Please add if anything is missing. 1. Airavata Workflow Message Context Airavata Workflow Message Context is the common data structure that is passing through all Airavata workflow components. Airavata Workflow Message Context includes the followings. * Airavata Workflow Messages - This contains the actual data that needs to be transferred through the workflow. Content of the Airavata Workflow Messages can be modified at Airavata Workflow Components. Single Airavata Workflow Message Context can hold multiple Airavata Workflow Messages, and they will be stored as key-value pairs keyed by the component id of the last modified component. (This is required for the Airavata Flow Barrier) * Flow Monitoring Information - Flow Monitoring Information contains the current status and progress of the workflow. * Parent Message Contexts - Parent Message Contexts includes the preceding Airavata Workflow Message Contexts if the current message context is created at the middle of the workflow. For example Airavata Flow Barriers and Airavata Flow Dividers create new message contexts combining and copying messages respectively. In such cases new message contexts will include its parent message context/s to this section. * Child Message Contexts - Child Message Contexts includes the succeeding Airavata Workflow Message Contexts if other message contexts are created at the middle of the workflow using the current message context. For example Airavata Flow Barriers and Airavata Flow Dividers create new message contexts combining and copying messages respectively. In such cases current message contexts will include its child message context/s to this section. * Next Airavata Workflow Component - Component ID of the next Airavata Workflow Component. 2. Airavata Workflow Router Airavata Workflow Router is responsible for keeping track of Airavata Workflow Message Contexts and directing them to specified Airavata Workflow Components. 3. Airavata Workflow Components i. Airavata Workflow Operators * Airavata Flow Starter - This is responsible for starting a specific branch of the Airavata Workflow. In a single Airavata Workflow there can be multiple starting points. This component creates a new Airavata Workflow Message Context and registers it to Airavata Workflow Router. * Configurations * Next Airavata Workflow Component * Input Dataset File * Airavata Flow Terminator - This is responsible for terminating a specific branch of the Airavata workflow. In a single workflow there can be multiple terminating points. * Configurations * Output File Location * Airavata Flow Barrier - Airavata Flow Barrier works as a waiting component at a middle of a workflow. For example if there are two experiments running and the results of both experiments are required to continue the workflow, barrier waits for both experiments to be completed before continuing. Within this component multiple Airavata Workflow Messages should be packaged into a new Airavata Workflow Message Context. * Configurations * Components to wait on * Next Airavata Workflow Component * Airavata Flow Divider - Airavata Flow Divider opens up new branches of the workflow. It is responsible for sending copies of Airavata Message to those branches separately. * Configurations * Next components to send copies * Airavata Condition Handler - Airavata Condition Handler is the path selection component of the workflow. This component is responsible for checking Airavata Message Context for conditions and directing it to required path of the workflow. * Configurations * Possible Next Airavata Workflow Components ii. Airavata Experiments These components are responsible for triggering current task execution framework to perform specific experiments. iii. Airavata Data Processors These components are responsible for processing data in the middle of a workflow. Sometimes output data of an experiment needs to be processed before sending to other experiments as inputs. iv. Airavata Loops * Airavata Foreach Loop - This loop can be paralleled * Airavata Do While Loop - This loop cannot be paralleled As we have discussed, I am planning to implement this Airavata Workflow Execution Framework using Apache Helix. To get a more clear understanding about the project it is better if you can provide some information about the experiment types (such as Echo, Gaussian) and data input and output formats of these experiments. If we need to process data (see Airavata Data Processor) when connecting two experiments in the workflow, it should also be done within super computers. I need to verify whether there is any implementation available for data processing currently within Airavata. Following diagram shows an example workflow without loops. It is better if you can explain a bit more about the required types of loops within Airavata workflow. <airavata-workflow-1.png> Regards On Tue, May 1, 2018 at 9:06 PM, DImuthu Upeksha <dimuthu.upeks...@gmail.com<mailto:dimuthu.upeks...@gmail.com>> wrote: Hi Yasas, This is really good. You have captured the problem correctly and provided a good visualization too. As we have discussed in the GSoC student meeting, I was wondering whether we can compose these workflows as Helix Tasks as well (One task per Experiment). Only thing that worries to me is how we can implement a barrier as mentioned in your second wrokflow using current Task framework. We might have to improve the task framework to support that. And we might need to think about constant / conditional loops and conditional (if else) paths inside the workflows. Please update the diagram accordingly for future references. You are in the right track. Keep it up. Thanks Dimuthu On Sun, Apr 29, 2018 at 1:57 AM, Yasas Gunarathne <yasasgunarat...@gmail.com<mailto:yasasgunarat...@gmail.com>> wrote: Hi All, Thank you very much for the information. I did a research on the internals of orchestrator and the new helix based workflow (lower level) execution, throughout past few weeks. Even though helix supports adding any number of experiments (i.e. complete experiments including pre and post workflows) chained together, it is required to maintain a higher level workflow manager as a separate layer for orchestrator and submit experiments one after the other (if cannot be run parallely) or parallelly (if execution is independent) to preserve the fault tolerance and enable flow handling of the higher level workflow. Therefore, the steps that the new layer of orchestrator is supposed to do will be, 1. Parsing the provided high level workflow schema and arrange the list of experiments. 2. Sending experiments according to the provided order and save their results in the storage resource. 3. If there are dependencies (Result of an experiment is required to generate the input for another experiment), managing them accordingly while providing support to do modifications to the results in between. 4. Providing flow handling methods (Start, Stop, Pause, Resume, Restart) I have attached few simple top level workflow examples to support the explanation. Please provide your valuable suggestions. Regards On Mon, Mar 26, 2018 at 8:43 AM, Suresh Marru <sma...@apache.org<mailto:sma...@apache.org>> wrote: Hi Yasas, Dimuthu already clarified but let me add few more points. Thats a very good questions, interpreter vs compiler (in the context of workflows). Yes Airavata historically took the interpreter approach, where after execution of each node in a workflows, the execution comes back to the enactment engine and re-inspects the state. This facilitated user interactively through executions. Attached state transition diagram might illustrate it more. Back to the current scope, I think you got the over all goal correct and your approach is reasonable. There are some details which are missing but thats expected. Just be aware that if your project is accepted, you will need to work with airavata community over the summer and refine the implementation details as you go. You are on a good start. Cheers, Suresh <workflow-states.png> On Mar 25, 2018, at 8:44 PM, DImuthu Upeksha <dimuthu.upeks...@gmail.com<mailto:dimuthu.upeks...@gmail.com>> wrote: Hi Yasas, I'm not a expert in XBaya design and use cases but I think Suresh can shed some light about it. However we no longer use XBaya for workflow interpretation. So don't get confused with the workflows defined in XBaya with the description provided in the JIRA ticket. Let's try to make the concepts clear. We need two levels of Workflows. 1. To run a single experiment of an Application. We call this as a DAG. So a DAG is statically defined. It can have a set of environment setup tasks, data staging tasks and a job submission task. For example, a DAG is created to run a Gaussian experiment on a compute host 2. To make a chain of Applications. This is what we call an actual Workflow. In a workflow you can have a Gaussian Experiment running and followed by a Lammps Experiment. So this is a dynamic workflow. Users can come up with different combinations of Applications as a workflow However your claim is true about pausing and restarting workflows. Either it is a statically defined DAG or a dynamic workflow, we should be able to do those operations. I can understand some of the words and terminologies that are in those resources are confusing and unclear so please feel free to let us know if you need anything to be clarified. Thanks Dimuthu On Sun, Mar 25, 2018 at 2:45 AM, Yasas Gunarathne <yasasgunarat...@gmail.com<mailto:yasasgunarat...@gmail.com>> wrote: Hi All, I have few questions to be clarified regarding the user-defined workflow execution in Apache Airavata. Here I am talking about the high level workflows that are used to chain together multiple applications. This related to the issue - Airavata-2717 [1]. In this [2] documentation it says that, the workflow interpreter that worked with XBaya provided an interpreted workflow execution framework rather than the compiled workflow execution environments, which allowed the users to pause the execution of the workflow as necessary and update the DAG’s execution states or even the DAG itself and resume execution. I want to know the actual requirement of having an interpreted workflow execution at this level. Is there any domain level advantage in allowing users to modify the order of workflow at runtime? I think we can have, pause, resume, restart, and stop commands available even in a compiled workflow execution environment, as long as we don't need to change the workflow. [1] https://issues.apache.org/jira/browse/AIRAVATA-2717 [2] http://airavata.apache.org/architecture/workflow.html Regards -- Yasas Gunarathne Undergraduate at Department of Computer Science and Engineering Faculty of Engineering - University of Moratuwa Sri Lanka LinkedIn<https://www.linkedin.com/in/yasasgunarathne/> | GitHub<https://github.com/yasgun> | Mobile : +94 77 4893616<tel:+94%2077%20489%203616> -- Yasas Gunarathne Undergraduate at Department of Computer Science and Engineering Faculty of Engineering - University of Moratuwa Sri Lanka LinkedIn<https://www.linkedin.com/in/yasasgunarathne/> | GitHub<https://github.com/yasgun> | Mobile : +94 77 4893616 -- Yasas Gunarathne Undergraduate at Department of Computer Science and Engineering Faculty of Engineering - University of Moratuwa Sri Lanka LinkedIn<https://www.linkedin.com/in/yasasgunarathne/> | GitHub<https://github.com/yasgun> | Mobile : +94 77 4893616 -- Yasas Gunarathne Undergraduate at Department of Computer Science and Engineering Faculty of Engineering - University of Moratuwa Sri Lanka LinkedIn<https://www.linkedin.com/in/yasasgunarathne/> | GitHub<https://github.com/yasgun> | Mobile : +94 77 4893616