I am bit concerned about the names. Are we assuming that API users has knowledge about GFac ? OR else we can just remove "GFac" substring and have method names like "void updateJobMetadta(..)"
Thanks Amila On Tue, May 21, 2013 at 11:28 PM, Saminda Wijeratne <[email protected]>wrote: > Following API functions are added for the ProvenanceManager[2], > > boolean isGFacJobExists(String gfacJobId) > void addGFacJob(GFacJob job) > void updateGFacJob(GFacJob job) > void updateGFacJobStatus(String gfacJobId, GFacJobStatus status) > void updateGFacJobData(String gfacJobId, String jobdata) > void updateGFacJobSubmittedTime(String gfacJobId, Date submitted) > void updateGFacJobCompletedTime(String gfacJobId, Date completed) > void updateGFacJobMetadta(String gfacJobId, String metadata) > GFacJob getGFacJob(String gfacJobId) > List<GFacJob> getGFacJobsForDescriptors(String serviceDescriptionId, String > hostDescriptionId, String applicationDescriptionId) > List<GFacJob> getGFacJobs(String experimentId, String workflowExecutionId, > String nodeId) > > Thoughts are welcome!!! > > > 2. > > https://svn.apache.org/repos/asf/airavata/trunk/modules/airavata-client/src/main/java/org/apache/airavata/client/api/ProvenanceManager.java > > > On Tue, May 21, 2013 at 5:04 PM, Saminda Wijeratne <[email protected] > >wrote: > > > But I thought the providers are part of the GFac (not as a separate > > service). If not then the providers should report to GFac. Orelse there > is > > no way the GFac knows what status to update which data to update etc. > Does > > the current GFac implementation support this? > > > > > > On Tue, May 21, 2013 at 4:47 PM, Amila Jayasekara < > [email protected] > > > wrote: > > > >> I think that should be handled at a more upper layer like Workflow > >> Interpretter or GFac. In FT perspective it is better if providers are > >> stateless. One reason is we dont have control over some providers and > and > >> there will be many places writing to disk if we implement the > persistence > >> logic at provider level. > >> > >> Thanks > >> Amila > >> > >> > >> On Tue, May 21, 2013 at 4:39 PM, Saminda Wijeratne <[email protected] > >> >wrote: > >> > >> > On Tue, May 21, 2013 at 4:36 PM, Amila Jayasekara > >> > <[email protected]>wrote: > >> > > >> > > On Tue, May 21, 2013 at 3:51 PM, Saminda Wijeratne < > >> [email protected] > >> > > >wrote: > >> > > > >> > > > Thanks for the feedback Amila. a few comments inline > >> > > > > >> > > > > >> > > > On Tue, May 21, 2013 at 12:29 PM, Amila Jayasekara > >> > > > <[email protected]>wrote: > >> > > > > >> > > > > Hi Saminda, > >> > > > > > >> > > > > Great suggestion. Also +1 for Dhanushka's proposal to have > >> > > > > serialize/de-serilized data. > >> > > > > Few suggestions, > >> > > > > 1. In addition to successful/error statuses we need other status > >> for > >> > > > nodes > >> > > > > & workflows > >> > > > > and workflows. > >> > > > > E . g :- > >> > > > > node - started, submitted, in-progress, failed, successful > etc > >> ... > >> > > > > > >> > > > Sorry if I was too vague. Yes we have more fine-grain statuses for > >> > > workflow > >> > > > and node[1]. We will have a much fine-grained level of granuality > >> for a > >> > > > GFacJob status. > >> > > > public static enum GFacJobStatus{ > >> > > > SUBMITTED, //job is submitted, possibly waiting to start > >> > > executing > >> > > > EXECUTING, //submitted job is being executed > >> > > > CANCELLED, //job was cancelled > >> > > > PAUSED, //job was paused > >> > > > WAITING_FOR_DATA, // job is waiting for data to continue > >> > > executing > >> > > > FAILED, // error occurred while job was executing and the > >> job > >> > > > stopped > >> > > > FINISHED, // job completed successfully > >> > > > UNKNOWN // unknown status. lookup the metadata for more > >> > details. > >> > > > } > >> > > > > >> > > > > >> > > > 2. This data will be useful in implementing FT and Load Balancing > in > >> > each > >> > > > > component. Sometime back we had discussions to make GFac > >> stateless. > >> > So > >> > > > who > >> > > > > is going to populate this data structure and persist it ? > >> > > > > > >> > > > That is a very good question... :). This summer is going to be a > >> long > >> > > > one... ;) > >> > > > > >> > > > >> > > What I meant is which component is doing persistence ? (GFac or WF > >> > > Interpretter). Not the actual person who is going to implement it > :). > >> > > > >> > hih hih.... > >> > Well its going to be whatever the provider respondible for managing > the > >> job > >> > lifecycle. For example GRAMProvider should be responsible for > recording > >> all > >> > the data relating to the GRAM jobs its working with. > >> > > >> > > > >> > > > >> > > > > >> > > > 1. > >> > > > > >> > > > > >> > > > >> > > >> > https://svn.apache.org/repos/asf/airavata/trunk/modules/workflow-model/workflow-model-core/src/main/java/org/apache/airavata/workflow/model/graph/Node.java > >> > > > > >> > > > > > >> > > > > Thanks > >> > > > > Amila > >> > > > > > >> > > > > > >> > > > > On Tue, May 21, 2013 at 11:39 AM, Saminda Wijeratne < > >> > > [email protected] > >> > > > > >wrote: > >> > > > > > >> > > > > > Thats is an excellent idea. We can have the job data field to > be > >> > the > >> > > > > > designated GFac job serialized data. The whatever GFacProvider > >> > should > >> > > > > > adhere to it. > >> > > > > > > >> > > > > > I'm still inclined to have the rest of the fields to ease of > >> > querying > >> > > > for > >> > > > > > the required data. For example if we wanted all attempts on > >> > executing > >> > > > > for a > >> > > > > > particular node of a workflow or if we wanted to know which > >> > > application > >> > > > > > descriptions are faster in execution or more reliable etc. we > >> can > >> > let > >> > > > the > >> > > > > > query language deal with it. wdyt? > >> > > > > > > >> > > > > > > >> > > > > > On Tue, May 21, 2013 at 11:24 AM, Danushka Menikkumbura < > >> > > > > > [email protected]> wrote: > >> > > > > > > >> > > > > > > Saminda, > >> > > > > > > > >> > > > > > > I think the data container does not need to have a generic > >> > format. > >> > > We > >> > > > > can > >> > > > > > > have a base class that facilitate object > >> > > > serialization/deserialization > >> > > > > > and > >> > > > > > > let specific meta data structure implement them as required. > >> We > >> > get > >> > > > the > >> > > > > > > Registry API to serialize objects and save them in a meta > data > >> > > table > >> > > > > > (with > >> > > > > > > just two columns?) and to deserialize as they are loaded off > >> the > >> > > > > > registry. > >> > > > > > > > >> > > > > > > Danushka > >> > > > > > > > >> > > > > > > > >> > > > > > > On Tue, May 21, 2013 at 8:34 PM, Saminda Wijeratne < > >> > > > [email protected] > >> > > > > > > >wrote: > >> > > > > > > > >> > > > > > > > It has being apparent more and more that saving the data > >> > related > >> > > to > >> > > > > > > > executing a jobs from the GFac can be useful for many > >> reasons > >> > > such > >> > > > > as, > >> > > > > > > > > >> > > > > > > > debugging > >> > > > > > > > retrying > >> > > > > > > > to make smart decisions on reliability/cost etc. > >> > > > > > > > statistical analysis > >> > > > > > > > > >> > > > > > > > Thus we thought of saving the data related to GFac jobs in > >> the > >> > > > > registry > >> > > > > > > in > >> > > > > > > > order to facilitate feature such as above in the future. > >> > > > > > > > > >> > > > > > > > However a GFac job is potentially any sort of computing > >> > resource > >> > > > > access > >> > > > > > > > (GRAM/UNICORE/EC2 etc.). Therefore we need to come up > with a > >> > > > > > generalized > >> > > > > > > > data structure that can hold the data of any type of > >> resource. > >> > > > > > Following > >> > > > > > > > are the suggested data to save for a single GFac job > >> execution, > >> > > > > > > > > >> > > > > > > > *experiment id, workflow instance id, node id* - pinpoint > >> the > >> > > node > >> > > > > > > > execution > >> > > > > > > > *service, host, application description ids *- pinpoint > the > >> > > > > descriptors > >> > > > > > > > responsible > >> > > > > > > > *local job id* - the unique job id retrieved/generated per > >> > > > execution > >> > > > > > > > [PRIMARY KEY] > >> > > > > > > > *job data* - data related executing the job (eg: the rsl > in > >> > GRAM) > >> > > > > > > > *submitted, completed time* > >> > > > > > > > *completed status* - whether the job was successfull or > ran > >> in > >> > to > >> > > > > > errors > >> > > > > > > > etc. > >> > > > > > > > *metadata* - custom field to add anything user wants > >> > > > > > > > > >> > > > > > > > Your feedback is most welcome. The API related changes > will > >> > also > >> > > be > >> > > > > > > > discussed once we have a proper data structure. We are > >> hoping > >> > to > >> > > > > > > implement > >> > > > > > > > this within next few days. > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > Saminda > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >
