Re: User Defined Workflow Execution Framework

DImuthu Upeksha Mon, 04 Jun 2018 07:45:01 -0700

Hi Yasas,

Thanks for the summary. As now you have a clear idea about what you have to
do, let's move on to implement a prototype that validates your workflow
blocks so that we can give our feedbacks constructively.


Hi Sudhakar,

Based on your question, I can imagine two scenarios.

1. Workflow is paused in the middle and resumed when required.
This is straightforward if we use Helix api directly

2. Workflow is stopped permanently and do a fresh restart of the workflow.
As far as I have understood, Helix currently does not have a workflow
cloning capability, So we might have to clone it from our side and instruct
Helix to run it as a new workflow. Or we can extend Helix api to support
workflow cloning which is the cleaner and ideal way. However it might need
some understanding of Helix code base and proper testing. So for the
time-being, let's go with the first approach.

Thanks
Dimuthu

On Sun, Jun 3, 2018 at 7:35 AM, Pamidighantam, Sudhakar <pamid...@iu.edu>
wrote:

> Is there a chance to include workflow restarter (where it was stopped
> earlier) in the tasks.
>
> Thanks,
> Sudhakar.
>
> On Jun 2, 2018, at 11:52 PM, Yasas Gunarathne <yasasgunarat...@gmail.com>
> wrote:
>
> Hi Suresh and Dimuthu,
>
> Thank you very much for the clarifications and suggestions. Based on them
> and other Helix related factors encountered during the implementation
> process, I updated and simplified the structure of workflow execution
> framework.
>
> *1. Airavata Workflow Manager*
>
> Airavata Workflow Manager is responsible for accepting the workflow
> information provided by the user, creating a Helix workflow with task
> dependencies, and submitting it for execution.
>
>
> *2. Airavata Workflow Data Blocks*
>
> Airavata Workflow Data Blocks are saved in JSON format as user contents in
> Helix workflow scope. These blocks contain the links of input data of the
> user, replica catalog entries of output data, and other information that
> are required for the workflow execution.
>
>
> *3. Airavata Workflow Tasks*
> *3.1. Operator Tasks*
>
> *i. Flow Starter Task*
>
> Flow Starter Task is responsible for starting a specific branch of the
> Airavata Workflow. In a single Airavata Workflow there can be multiple
> starting points.
>
> *ii. Flow Terminator Task*
>
> Flow Terminator Task is responsible for terminating a specific branch of
> the Airavata workflow. In a single workflow there can be multiple
> terminating points.
>
> *iii. Flow Barrier Task*
>
> Flow Barrier Task works as a waiting component at a middle of a workflow.
> For example if there are two experiments running and the results of both
> experiments are required to continue the workflow, barrier waits for both
> experiments to be completed before continuing.
>
> *iv. Flow Divider Task*
>
> Flow Divider Task opens up new branches of the workflow.
>
> *v. Condition Handler Task*
>
> Condition Handler Task is the path selection component of the workflow.
>
>
> *3.2. Processor Tasks*
>
> These components are responsible for triggering Orchestrator to perform
> specific processes (ex: experiments / data processing activities).
>
>
> *3.3. Loop Tasks*
>
> *i. Foreach Loop Task*
> *ii. Do While Loop Task*
>
>
> Regards
>
> On Mon, May 21, 2018 at 4:01 PM Suresh Marru <sma...@apache.org> wrote:
>
>> Hi Yasas,
>>
>> This is good detail, I haven’t digested it all, but a quick feedback.
>> Instead of connecting multiple experiments within a workflow which will be
>> confusing from a user point of view, can you use the following terminology:
>>
>> * A computational experiment may have a single application execution or
>> multiple (workflow).
>>
>> ** So an experiment may correspond to a single application execution,
>> multiple application execution or even multiple workflows nested amongst
>> them (hierarchal workflows) To avoid any confusion, lets call these units
>> of execution as a Process.
>>
>> A process is an abstract notion for a unit of execution without going
>> into implementing details and it described the inputs and outputs. For an
>> experiment with single application, experiment and process have one on one
>> correspondence, but within a workflow, each step is a Process.
>>
>> Tasks are the implementation detail of a Process.
>>
>> So the change in your Architecture will be to chain multiple processes
>> together within an experiment and not to chain multiple experiments. Does
>> it make sense? You can also refer to the attached figure which illustrate
>> these from a data model perspective.
>>
>> Suresh
>>
>> P.S. Over all, great going in mailing list communications, keep’em
>> coming.
>>
>>
>> On May 21, 2018, at 1:25 AM, Yasas Gunarathne <yasasgunarat...@gmail.com>
>> wrote:
>>
>> Hi Upeksha,
>>
>> Thank you for the information. I have identified the components that
>> needs to be included in the workflow execution framework. Please add if
>> anything is missing.
>>
>> *1. Airavata Workflow Message Context*
>>
>> Airavata Workflow Message Context is the common data structure that is
>> passing through all Airavata workflow components. Airavata Workflow Message
>> Context includes the followings.
>>
>>
>>    - *Airavata Workflow Messages *- This contains the actual data that
>>    needs to be transferred through the workflow. Content of the Airavata
>>    Workflow Messages can be modified at Airavata Workflow Components. Single
>>    Airavata Workflow Message Context can hold multiple Airavata Workflow
>>    Messages, and they will be stored as key-value pairs keyed by the 
>> component
>>    id of the last modified component. (This is required for the Airavata Flow
>>    Barrier)
>>    - *Flow Monitoring Information* - Flow Monitoring Information
>>    contains the current status and progress of the workflow.
>>    - *Parent Message Contexts *- Parent Message Contexts includes the
>>    preceding Airavata Workflow Message Contexts if the current message 
>> context
>>    is created at the middle of the workflow. For example Airavata Flow
>>    Barriers and Airavata Flow Dividers create new message contexts combining
>>    and copying messages respectively. In such cases new message contexts will
>>    include its parent message context/s to this section.
>>    - *Child Message Contexts* - Child Message Contexts includes the
>>    succeeding Airavata Workflow Message Contexts if other message contexts 
>> are
>>    created at the middle of the workflow using the current message context.
>>    For example Airavata Flow Barriers and Airavata Flow Dividers create new
>>    message contexts combining and copying messages respectively. In such 
>> cases
>>    current message contexts will include its child message context/s to this
>>    section.
>>    - *Next Airavata Workflow Component* - Component ID of the next
>>    Airavata Workflow Component.
>>
>>
>> *2. Airavata Workflow Router*
>>
>> Airavata Workflow Router is responsible for keeping track of Airavata
>> Workflow Message Contexts and directing them to specified Airavata Workflow
>> Components.
>>
>>
>> *3. Airavata Workflow Components*
>>
>> *i. Airavata Workflow Operators*
>>
>>
>>    - *Airavata Flow Starter *- This is responsible for starting a
>>    specific branch of the Airavata Workflow. In a single Airavata Workflow
>>    there can be multiple starting points. This component creates a new
>>    Airavata Workflow Message Context and registers it to Airavata Workflow
>>    Router.
>>       - Configurations
>>          - Next Airavata Workflow Component
>>          - Input Dataset File
>>       - *Airavata Flow Terminator* - This is responsible for terminating
>>    a specific branch of the Airavata workflow. In a single workflow there can
>>    be multiple terminating points.
>>       - Configurations
>>          - Output File Location
>>       - *Airavata Flow Barrier* - Airavata Flow Barrier works as a
>>    waiting component at a middle of a workflow. For example if there are two
>>    experiments running and the results of both experiments are required to
>>    continue the workflow, barrier waits for both experiments to be completed
>>    before continuing. Within this component multiple Airavata Workflow
>>    Messages should be packaged into a new Airavata Workflow Message Context.
>>       - Configurations
>>          - Components to wait on
>>          - Next Airavata Workflow Component
>>          - *Airavata Flow Divider* - Airavata Flow Divider opens up new
>>    branches of the workflow. It is responsible for sending copies of Airavata
>>    Message to those branches separately.
>>       - Configurations
>>          - Next components to send copies
>>       - *Airavata Condition Handler* - Airavata Condition Handler is the
>>    path selection component of the workflow. This component is responsible 
>> for
>>    checking Airavata Message Context for conditions and directing it to
>>    required path of the workflow.
>>       - Configurations
>>          - Possible Next Airavata Workflow Components
>>
>>
>> *ii. Airavata Experiments*
>>
>> These components are responsible for triggering current task execution
>> framework to perform specific experiments.
>>
>>
>> *iii. Airavata Data Processors*
>>
>> These components are responsible for processing data in the middle of a
>> workflow. Sometimes output data of an experiment needs to be processed
>> before sending to other experiments as inputs.
>>
>>
>> *iv. Airavata Loops*
>>
>>
>>    - *Airavata Foreach Loop* - This loop can be paralleled
>>    - *Airavata Do While Loop* - This loop cannot be paralleled
>>
>>
>> As we have discussed, I am planning to implement this Airavata Workflow
>> Execution Framework using Apache Helix. To get a more clear understanding
>> about the project it is better if you can provide some information about
>> the experiment types (such as Echo, Gaussian) and data input and output
>> formats of these experiments.
>>
>> If we need to process data (see Airavata Data Processor) when connecting
>> two experiments in the workflow, it should also be done within super
>> computers. I need to verify whether there is any implementation available
>> for data processing currently within Airavata.
>>
>> Following diagram shows an example workflow without loops. It is better
>> if you can explain a bit more about the required types of loops within
>> Airavata workflow.
>>
>> <airavata-workflow-1.png>
>> 
>> Regards
>>
>>
>> On Tue, May 1, 2018 at 9:06 PM, DImuthu Upeksha <dimuthu.upeksha2@
>> gmail.com> wrote:
>>
>>> Hi Yasas,
>>>
>>> This is really good. You have captured the problem correctly and
>>> provided a good visualization too. As we have discussed in the GSoC student
>>> meeting, I was wondering whether we can compose these workflows as Helix
>>> Tasks as well (One task per Experiment). Only thing that worries to me is
>>> how we can implement a barrier as mentioned in your second wrokflow using
>>> current Task framework. We might have to improve the task framework to
>>> support that.
>>>
>>> And we might need to think about constant / conditional loops and
>>> conditional (if else) paths inside the workflows. Please update the diagram
>>> accordingly for future references.
>>>
>>> You are in the right track. Keep it up.
>>>
>>> Thanks
>>> Dimuthu
>>>
>>>
>>> On Sun, Apr 29, 2018 at 1:57 AM, Yasas Gunarathne <yasasgunarathne@
>>> gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Thank you very much for the information. I did a research on the
>>>> internals of orchestrator and the new helix based workflow (lower level)
>>>> execution, throughout past few weeks.
>>>>
>>>> Even though helix supports adding any number of experiments (i.e.
>>>> complete experiments including pre and post workflows) chained together, it
>>>> is required to maintain a higher level workflow manager as a separate layer
>>>> for orchestrator and submit experiments one after the other (if cannot be
>>>> run parallely) or parallelly (if execution is independent)  to preserve the
>>>> fault tolerance and enable flow handling of the higher level workflow.
>>>>
>>>> Therefore, the steps that the new layer of orchestrator is supposed to
>>>> do will be,
>>>>
>>>>    1. Parsing the provided high level workflow schema and arrange the
>>>>    list of experiments.
>>>>    2. Sending experiments according to the provided order and save
>>>>    their results in the storage resource.
>>>>    3. If there are dependencies (Result of an experiment is required
>>>>    to generate the input for another experiment), managing them accordingly
>>>>    while providing support to do modifications to the results in between.
>>>>    4. Providing flow handling methods (Start, Stop, Pause, Resume,
>>>>    Restart)
>>>>
>>>> I have attached few simple top level workflow examples to support the
>>>> explanation. Please provide your valuable suggestions.
>>>>
>>>> Regards
>>>>
>>>>
>>>> On Mon, Mar 26, 2018 at 8:43 AM, Suresh Marru <sma...@apache.org> wrot
>>>> e:
>>>>
>>>>> Hi Yasas,
>>>>>
>>>>> Dimuthu already clarified but let me add few more points.
>>>>>
>>>>> Thats a very good questions, interpreter vs compiler (in the context
>>>>> of workflows). Yes Airavata historically took the interpreter approach,
>>>>> where after execution of each node in a workflows, the execution comes 
>>>>> back
>>>>> to the enactment engine and re-inspects the state. This facilitated user
>>>>> interactively through executions. Attached state transition diagram might
>>>>> illustrate it more.
>>>>>
>>>>> Back to the current scope, I think you got the over all goal correct
>>>>> and your approach is reasonable. There are some details which are missing
>>>>> but thats expected. Just be aware that if your project is accepted, you
>>>>> will need to work with airavata community over the summer and refine the
>>>>> implementation details as you go. You are on a good start.
>>>>>
>>>>> Cheers,
>>>>> Suresh
>>>>> <workflow-states.png>
>>>>>
>>>>>
>>>>> On Mar 25, 2018, at 8:44 PM, DImuthu Upeksha <
>>>>> dimuthu.upeks...@gmail.com> wrote:
>>>>>
>>>>> Hi Yasas,
>>>>>
>>>>> I'm not a expert in XBaya design and use cases but I think Suresh can
>>>>> shed some light about it. However we no longer use XBaya for workflow
>>>>> interpretation. So don't get confused with the workflows defined in XBaya
>>>>> with the description provided in the JIRA ticket. Let's try to make the
>>>>> concepts clear. We need two levels of Workflows.
>>>>>
>>>>> 1. To run a single experiment of an Application. We call this as a
>>>>> DAG. So a DAG is statically defined. It can have a set of environment 
>>>>> setup
>>>>> tasks, data staging tasks and a job submission task. For example, a DAG is
>>>>> created to run a  Gaussian experiment on a compute host
>>>>> 2. To make a chain of Applications. This is what we call an actual
>>>>> Workflow. In a workflow you can have a Gaussian Experiment running and
>>>>> followed by a Lammps Experiment. So this is a dynamic workflow. Users can
>>>>> come up with different combinations of Applications as a workflow
>>>>>
>>>>> However your claim is true about pausing and restarting workflows.
>>>>> Either it is a statically defined DAG or a dynamic workflow, we should be
>>>>> able to do those operations.
>>>>>
>>>>> I can understand some of the words and terminologies that are in those
>>>>> resources are confusing and unclear so please feel free to let us know if
>>>>> you need anything to be clarified.
>>>>>
>>>>> Thanks
>>>>> Dimuthu
>>>>>
>>>>> On Sun, Mar 25, 2018 at 2:45 AM, Yasas Gunarathne <yasasgunarathne@
>>>>> gmail.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have few questions to be clarified regarding the user-defined
>>>>>> workflow execution in Apache Airavata. Here I am talking about the high
>>>>>> level workflows that are used to chain together multiple applications. 
>>>>>> This
>>>>>> related to the issue - Airavata-2717 [1].
>>>>>>
>>>>>> In this [2] documentation it says that, the workflow interpreter that
>>>>>> worked with XBaya provided an interpreted workflow execution framework
>>>>>> rather than the compiled workflow execution environments, which allowed 
>>>>>> the
>>>>>> users to pause the execution of the workflow as necessary and update the
>>>>>> DAG’s execution states or even the DAG itself and resume execution.
>>>>>>
>>>>>> I want to know the actual requirement of having an interpreted
>>>>>> workflow execution at this level. Is there any domain level advantage in
>>>>>> allowing users to modify the order of workflow at runtime?
>>>>>>
>>>>>> I think we can have, pause, resume, restart, and stop commands
>>>>>> available even in a compiled workflow execution environment, as long as 
>>>>>> we
>>>>>> don't need to change the workflow.
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/AIRAVATA-2717
>>>>>> [2] http://airavata.apache.org/architecture/workflow.html
>>>>>>
>>>>>> Regards
>>>>>> --
>>>>>> *Yasas Gunarathne*
>>>>>> Undergraduate at Department of Computer Science and Engineering
>>>>>> Faculty of Engineering - University of Moratuwa Sri Lanka
>>>>>> LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
>>>>>> <https://github.com/yasgun> | Mobile : +94 77 4893616
>>>>>> <+94%2077%20489%203616>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Yasas Gunarathne*
>>>> Undergraduate at Department of Computer Science and Engineering
>>>> Faculty of Engineering - University of Moratuwa Sri Lanka
>>>> LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
>>>> <https://github.com/yasgun> | Mobile : +94 77 4893616
>>>>
>>>
>>>
>>
>>
>> --
>> *Yasas Gunarathne*
>> Undergraduate at Department of Computer Science and Engineering
>> Faculty of Engineering - University of Moratuwa Sri Lanka
>> LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
>> <https://github.com/yasgun> | Mobile : +94 77 4893616
>>
>>
>>
>
> --
> *Yasas Gunarathne*
> Undergraduate at Department of Computer Science and Engineering
> Faculty of Engineering - University of Moratuwa Sri Lanka
> LinkedIn <https://www.linkedin.com/in/yasasgunarathne/> | GitHub
> <https://github.com/yasgun> | Mobile : +94 77 4893616
>
>
>

Re: User Defined Workflow Execution Framework

Reply via email to