[
https://issues.apache.org/jira/browse/SAMZA-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892862#comment-15892862
]
Navina Ramesh commented on SAMZA-1113:
--------------------------------------
Current Design (as proposed in the design document in
[SAMZA-1064|https://issues.apache.org/jira/browse/SAMZA-1064]):
* Processors startup and join the default processing group under
/jobname-jobid/processors
* When processors leave, tasks are shuffled to remaining processors
* The job is expected to be Alive or Active if it has at least 1 processor in
the processing group.
* The job is assumed to have Shutdown if there are no more processors in the
processing group.
* Assumes that we do not support rolling bounce
Current design is simple. However, it falls short for the following reason:
# If there are no processors remaining in the processing group, we don't know
if the job had a graceful shutdown or that all processors failed abruptly.
There is no feedback on the "status" of the job.
# It is not clear what should happen if we want to restart or upgrade an
existing job because there is only one "attempt" associated with the job. For
example, in a processing group of size 10, if 5 processors are restarted, which
"attempt" should they join. This is critical to clearly define the lifecycle of
the job itself and how to maintain/upgrade it over time.
# This directly impacts the abstraction layer above (DAG handler - See
SAMZA-1041) - ApplicationRunner/ExecutionEnvironment as it directly manages the
individual stages of the job.
Requirements:
* Associate an attempt number for a particular job's scope
* For each attempt, we should be able to infer the state of the job (whether it
is ACTIVE, SHUTDOWN or FAILED)
* Trigger for a graceful shutdown can come from an external entity or from the
job itself (batch jobs)
> Implement startup and shutdown sequence of jobs in ZK environment
> -----------------------------------------------------------------
>
> Key: SAMZA-1113
> URL: https://issues.apache.org/jira/browse/SAMZA-1113
> Project: Samza
> Issue Type: Sub-task
> Reporter: Navina Ramesh
> Assignee: Navina Ramesh
> Fix For: 0.13.0
>
>
> Problem that we need to solve is: Do we need multiple job attempts in the ZK
> tree? If yes, who creates the persistent subtrees? There is no leader until
> the ZK trees are setup.
> In the initial prototype, the first processor instance creates the ZK
> hierarchy. If we were to support multiple job attempts, then we need
> different ZK trees for each attempt. How do all the processors within a job
> know which the attempt ID to join?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)