[jira] [Commented] (SAMZA-1113) Implement startup and shutdown sequence of jobs in ZK environment

Navina Ramesh (JIRA) Thu, 02 Mar 2017 11:46:05 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892862#comment-15892862
 ]


Navina Ramesh commented on SAMZA-1113:
--------------------------------------

Current Design (as proposed in the design document in 
[SAMZA-1064|https://issues.apache.org/jira/browse/SAMZA-1064]):
* Processors startup and join the default processing group under 
/jobname-jobid/processors
* When processors leave, tasks are shuffled to remaining processors
* The job is expected to be Alive or Active if it has at least 1 processor in 
the processing group.
* The job is assumed to have Shutdown if there are no more processors in the 
processing group.
* Assumes that we do not support rolling bounce 

Current design is simple. However, it falls short for the following reason:
# If there are no processors remaining in the processing group, we don't know 
if the job had a graceful shutdown or that all processors failed abruptly. 
There is no feedback on the "status" of the job. 
# It is not clear what should happen if we want to restart or upgrade an 
existing job because there is only one "attempt" associated with the job. For 
example, in a processing group of size 10, if 5 processors are restarted, which 
"attempt" should they join. This is critical to clearly define the lifecycle of 
the job itself and how to maintain/upgrade it over time. 
# This directly impacts the abstraction layer above (DAG handler - See 
SAMZA-1041) - ApplicationRunner/ExecutionEnvironment as it directly manages the 
individual stages of the job. 

Requirements:
* Associate an attempt number for a particular job's scope
* For each attempt, we should be able to infer the state of the job (whether it 
is ACTIVE, SHUTDOWN or FAILED)
* Trigger for a graceful shutdown can come from an external entity or from the 
job itself (batch jobs)
 


> Implement startup and shutdown sequence of jobs in ZK environment
> -----------------------------------------------------------------
>
>                 Key: SAMZA-1113
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1113
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Navina Ramesh
>            Assignee: Navina Ramesh
>             Fix For: 0.13.0
>
>
> Problem that we need to solve is: Do we need multiple job attempts in the ZK 
> tree? If yes, who creates the persistent subtrees? There is no leader until 
> the ZK trees are setup.
> In the initial prototype, the first processor instance creates the ZK 
> hierarchy. If we were to support multiple job attempts, then we need 
> different ZK trees for each attempt. How do all the processors within a job 
> know which the attempt ID to join?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (SAMZA-1113) Implement startup and shutdown sequence of jobs in ZK environment

Reply via email to