Actually I am planning to do a state diagram and sequence diagram for airavata backend. Will post it soon.
On Tue, Apr 1, 2014 at 8:55 PM, Saminda Wijeratne <[email protected]>wrote: > Thanks Amila and Terri for your valuable insights. > > Combinning Terris' and Amilas' input, do you think the actions carried-out > should be managed by internal action states or through states relating to > various stages of an experiment? Do you have any thoughts on which design > would be more flexible to follow? > > One other thing I saw in CIPRES is that you have reduced the risk of whole > system going down because of failure of operation in one part of the system > by separating the main activities in to different processes. i.e. CIPRES > portal handles only user requests and 3 independent daemons handle > different aspects of job management. Terri, any other advantages you've > expected through this design? > > Thanks, > Saminda > > On Tue, Apr 1, 2014 at 4:59 PM, Schwartz, Terri <[email protected]> wrote: > > > I struggled with this in cipres and looked at it much like Amila is > > saying. Anywhere, I was storing state, I would ask myself, "what happens > > if cipres (or its database) crashes right before this or right after > this?" > > What will happen when cipres starts up again? Will it assume the > > operation didn't run and retry it and is that safe to do? I generally > > update state after initiating operations, not before, so don't have to > deal > > with the possibility that we said we did something we didn't actually do, > > just have to deal with the possibility that we kicked something off and > > didn't manage to record it. > > > > I tried to make operations idempotent as much as possible, sometimes by > > wrapping them in code that looks for signs of a prior attempt and cleans > > things up before proceeding. > > > > Terri > > ________________________________________ > > From: Amila Jayasekara [[email protected]] > > Sent: Tuesday, April 01, 2014 1:29 PM > > To: [email protected] > > Subject: Re: Fault Tolerant Use cases & Solutions for Job Management in > > Airavata > > > > Hmm... If I explain this in PL concepts a state basically refers to an > > environment (mapping of variables to their values) :-). > > > > But in general applications (like Airavata) the state is represented by > > what you persist. (Provided you persist right information) > > > > E.g :- Consider getExperiments() API call. No matter how many times we > call > > this, this doesnt change the persisted data in the system. Therefore > > function getExperiments() doesnt change the state. Therefore we can > safely > > exclude this method call when analyzing FT. Now consider addExperiment(). > > This adds an experiment to persistent storage and it changes the state. > If > > you are doing multiple transactions within addExperiment(), you need to > > consider the resulting state if program fails in between each > transaction. > > If state is inconsistent then you need to come up with a solution. > > > > > > > > > > On Tue, Apr 1, 2014 at 4:13 PM, Saminda Wijeratne <[email protected] > > >wrote: > > > > > Are you talking about modeling it similar to a state machine? if not > can > > > you elaborate what you meant by states in the system? > > > > > > > > > On Tue, Apr 1, 2014 at 4:00 PM, Amila Jayasekara < > > [email protected] > > > >wrote: > > > > > > > One suggestion is to first identify states in the system. Then > identify > > > > actions (operation / method invocations) which change the state of > the > > > > system. Then model FT cases by analyzing system state after and > before > > a > > > > failure (during those operation invocations). > > > > > > > > Thanks > > > > Amila > > > > > > > > > > > > On Tue, Apr 1, 2014 at 3:49 PM, Saminda Wijeratne < > [email protected] > > > > >wrote: > > > > > > > > > Hi All, > > > > > > > > > > We are trying to identify scenarios in job management which is > > critical > > > > to > > > > > provide fault tolerant solutions. The spreadsheet[1] contains a > list > > of > > > > > such use cases I have compiled to the best of my knowledge (which > is > > no > > > > way > > > > > complete). Thoughts are welcome (reply/comment or edit spreadsheet) > > > > > > > > > > I think it is particularly useful to learn how gateways like > > > > > CIPRES/NSG/Ultrascan (who has a large user base) already handle > these > > > > > situations. Spreadsheet updated to record those as well. > > > > > > > > > > (if you don't have edit privileges just drop me a mail/reply) > > > > > > > > > > Thanks and Regards, > > > > > Saminda > > > > > > > > > > 1. > > > > > > > > > > > > > > > > > > > > https://docs.google.com/spreadsheets/d/1eukcg2nXIoMzXa0GakNQVIICMd8y0UYGGjQs32232Hs/edit#gid=1448745788 > > > > > > > > > > > > > > > -- System Analyst Programmer PTI Lab Indiana University
