Nice thread. This would also contribute to Airavata's elasticity in cloud deployments.
Marlon On 4/1/14 10:24 PM, Lahiru Gunathilake wrote: > Actually I am planning to do a state diagram and sequence diagram for > airavata backend. Will post it soon. > > > On Tue, Apr 1, 2014 at 8:55 PM, Saminda Wijeratne <[email protected]>wrote: > >> Thanks Amila and Terri for your valuable insights. >> >> Combinning Terris' and Amilas' input, do you think the actions carried-out >> should be managed by internal action states or through states relating to >> various stages of an experiment? Do you have any thoughts on which design >> would be more flexible to follow? >> >> One other thing I saw in CIPRES is that you have reduced the risk of whole >> system going down because of failure of operation in one part of the system >> by separating the main activities in to different processes. i.e. CIPRES >> portal handles only user requests and 3 independent daemons handle >> different aspects of job management. Terri, any other advantages you've >> expected through this design? >> >> Thanks, >> Saminda >> >> On Tue, Apr 1, 2014 at 4:59 PM, Schwartz, Terri <[email protected]> wrote: >> >>> I struggled with this in cipres and looked at it much like Amila is >>> saying. Anywhere, I was storing state, I would ask myself, "what happens >>> if cipres (or its database) crashes right before this or right after >> this?" >>> What will happen when cipres starts up again? Will it assume the >>> operation didn't run and retry it and is that safe to do? I generally >>> update state after initiating operations, not before, so don't have to >> deal >>> with the possibility that we said we did something we didn't actually do, >>> just have to deal with the possibility that we kicked something off and >>> didn't manage to record it. >>> >>> I tried to make operations idempotent as much as possible, sometimes by >>> wrapping them in code that looks for signs of a prior attempt and cleans >>> things up before proceeding. >>> >>> Terri >>> ________________________________________ >>> From: Amila Jayasekara [[email protected]] >>> Sent: Tuesday, April 01, 2014 1:29 PM >>> To: [email protected] >>> Subject: Re: Fault Tolerant Use cases & Solutions for Job Management in >>> Airavata >>> >>> Hmm... If I explain this in PL concepts a state basically refers to an >>> environment (mapping of variables to their values) :-). >>> >>> But in general applications (like Airavata) the state is represented by >>> what you persist. (Provided you persist right information) >>> >>> E.g :- Consider getExperiments() API call. No matter how many times we >> call >>> this, this doesnt change the persisted data in the system. Therefore >>> function getExperiments() doesnt change the state. Therefore we can >> safely >>> exclude this method call when analyzing FT. Now consider addExperiment(). >>> This adds an experiment to persistent storage and it changes the state. >> If >>> you are doing multiple transactions within addExperiment(), you need to >>> consider the resulting state if program fails in between each >> transaction. >>> If state is inconsistent then you need to come up with a solution. >>> >>> >>> >>> >>> On Tue, Apr 1, 2014 at 4:13 PM, Saminda Wijeratne <[email protected] >>>> wrote: >>>> Are you talking about modeling it similar to a state machine? if not >> can >>>> you elaborate what you meant by states in the system? >>>> >>>> >>>> On Tue, Apr 1, 2014 at 4:00 PM, Amila Jayasekara < >>> [email protected] >>>>> wrote: >>>>> One suggestion is to first identify states in the system. Then >> identify >>>>> actions (operation / method invocations) which change the state of >> the >>>>> system. Then model FT cases by analyzing system state after and >> before >>> a >>>>> failure (during those operation invocations). >>>>> >>>>> Thanks >>>>> Amila >>>>> >>>>> >>>>> On Tue, Apr 1, 2014 at 3:49 PM, Saminda Wijeratne < >> [email protected] >>>>>> wrote: >>>>>> Hi All, >>>>>> >>>>>> We are trying to identify scenarios in job management which is >>> critical >>>>> to >>>>>> provide fault tolerant solutions. The spreadsheet[1] contains a >> list >>> of >>>>>> such use cases I have compiled to the best of my knowledge (which >> is >>> no >>>>> way >>>>>> complete). Thoughts are welcome (reply/comment or edit spreadsheet) >>>>>> >>>>>> I think it is particularly useful to learn how gateways like >>>>>> CIPRES/NSG/Ultrascan (who has a large user base) already handle >> these >>>>>> situations. Spreadsheet updated to record those as well. >>>>>> >>>>>> (if you don't have edit privileges just drop me a mail/reply) >>>>>> >>>>>> Thanks and Regards, >>>>>> Saminda >>>>>> >>>>>> 1. >>>>>> >>>>>> >> https://docs.google.com/spreadsheets/d/1eukcg2nXIoMzXa0GakNQVIICMd8y0UYGGjQs32232Hs/edit#gid=1448745788 > >
