Status Current state: [ Work in progress] ... JIRA: SAMZA-2435 Released: Problem Samza deploys as managed service leveraging cluster resource managers like Yarn, Kubernetes In either of those versions, samza deployments do not have an identifier associated with a deployment. Motivation Such an identifier helps to build lineage for a job deployment and helps distinguish one deployment from the other. While some cluster managers like Yarn have an alternative identifier (app-attempt-id), cluster managers like Kubernetes don't have one. This can help build tooling for Samza apps Proposed Changes Introduce a new config "job.deployment.id" for the samza deployments, making it optional to be provided by the deployment system via either of these paths. ...
- Since this deployment id is exposed as a part of the config, this info will already be exposed externally to other tools via the SamzaAppMasterServlet to query
Cons: Public Interfaces
Code Block |
| language |
java |
| title |
JobConfig.java |
|
public class JobConfig extends MapConfig {
/**
* Acts as a unique identifier of a deployment of the job
*/
public static final String JOB_DEPLOYMENT_ID = "job.deployment.id";
} |
Implementation and Test Plan Since the config loading and building will now be done by the ClusterBasedJobCoordinator (JC), it checks for this config to be either injected by the options mentioned above it not generates a unique UUID for the app. Compatibility, Deprecation, and Migration Plan This is a backward-compatible change since even if there is a new config getting introduced it is getting auto-generated by the ClusterbasedJobCoordinator hence new and even the existing jobs can inject this config via the options listed above or Rejected Alternatives Introduce a new API into the ClusterResoruceManager which is supposed to be implemented by the implementation layer ... |