[ 
https://issues.apache.org/jira/browse/QUARKS-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227299#comment-15227299
 ] 

Victor Dogaru commented on QUARKS-105:
--------------------------------------

> So would this be specified in the JSONConfig used to submit an application, 
> so a number of restarts property similar to job name?

Yes. "restartCount" takes a value which specifies the number of times the 
application would be restarted if its job becomes unhealthy.
   
> What timeframe is the count over? Is it reset to zero if the application 
> successfully starts? 

I'm not sure I understand the question.  For the scope of this JIRA task, 
restartCount time frame is "forever".
This means, the system will declare the application "dead" (and escalate the 
failure) after the application has been restarted "restartCount" times, whether 
the application fails every minute or every 5 days.

A subsequent task might further refine the definition, for example, 
"restartCount" resets to its original value if the application has run for 
longer than a predefined duration. 

> What would define a successful start?
Not receiving a job event with health==UNHEALTHY. Again, this definition 
applies for the scope of this JIRA task.

One might further refine this, for example the system should not restart an app 
(escalate its failure instead) if it executed for less than 10 seconds before 
being terminated. This behavior would prevent the system from restarting apps 
if they crash close to startup time.

> Configurable number of application restarts
> -------------------------------------------
>
>                 Key: QUARKS-105
>                 URL: https://issues.apache.org/jira/browse/QUARKS-105
>             Project: Quarks
>          Issue Type: Sub-task
>          Components: Applications, Runtime
>            Reporter: Victor Dogaru
>              Labels: failure-recovery
>
> This configuration would allow a developer to specify the number of times the 
> system should attempt to restart an application which had terminated because 
> of an unhandled exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to