[ 
https://issues.apache.org/jira/browse/QUARKS-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221911#comment-15221911
 ] 

ASF GitHub Bot commented on QUARKS-66:
--------------------------------------

Github user dlaboss commented on a diff in the pull request:

    https://github.com/apache/incubator-quarks/pull/59#discussion_r58229460
  
    --- Diff: 
api/topology/src/main/java/quarks/topology/services/ApplicationService.java ---
    @@ -63,4 +65,11 @@ Licensed to the Apache Software Foundation (ASF) under 
one
          * @see ApplicationServiceMXBean
          */
         void registerTopology(String applicationName, BiConsumer<Topology, 
JsonObject> builder);
    +    
    --- End diff --
    
    Above for `registerTopology()` what are the requirements for the appName?  
What happens if one by that name is already registered?  By definition 
elsewhere, is a Job's appName already required to be unique? Regardless, seems 
like it could help to add doc here to clarify things.
    
    I'm also wondering about this "register with appName prior to submit" model 
vs say "register the *Job* following the submit".  A post submit registration 
scheme seems to enable leaving it to the system/provider-impl to decide what to 
use as an identifier to find the topology-builder to rebuild/resubmit the job.  
 It also feels more logical to express "I want this Job monitored" rather than 
"I want a/all jobs with this appName monitored"... though maybe that's just me. 
 Does the pre-submit scheme handle recovery from certain startup failures that 
the post-submit scheme can't?
    
    Is there also a need for an unregisterTopology() or is it just that an 
explicitly cancelled job is effectively automatically unregistered?


> Job monitoring application which restarts failed jobs
> -----------------------------------------------------
>
>                 Key: QUARKS-66
>                 URL: https://issues.apache.org/jira/browse/QUARKS-66
>             Project: Quarks
>          Issue Type: Task
>            Reporter: Victor Dogaru
>            Assignee: Victor Dogaru
>              Labels: failure-recovery
>
> An application which filters job events indicating jobs which closed with an 
> unhealthy state and resubmits applications associated with those jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to