Status Current state: [ Work in progress] Discussion thread: <link to mailing list DISCUSS thread> JIRA: SAMZA-2435 Released: Problem Samza deploys as managed service leveraging cluster resource managers like Yarn, Kubernetes In either of those versions, samza deployments do not have an identifier associated with a deployment. Motivation ... Proposed Changes ... Introduce a new config "job.deployment.id" for the samza deployments, making it optional to be provided by the deployment system via either of these paths. ... Specify it using --config to the job launcher script(run-app.sh)
Code Block |
|
|
deploy/samza/bin/run-app.sh \
--config job.name=wikipedia-stats \
--config job.deployment.id=wikipedia-stats \
--config job.factory.class=org.apache.samza.job.yarn.YarnJobFactory \
--config yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz \
--config job.config.loader.class==org.apache.samza.config.loader.PropertiesConfigLoader \
--config job.config.loader.properties.path=/__package/config/wikipedia-feed.properties
|
...
- Since this deployment id is exposed as a part of the config, this info will already be exposed externally to other tools via the SamzaAppMasterServlet to query
Cons: Public Interfaces ...
| language |
java |
| title |
JobConfig.java |
... Implementation and Test Plan Since the config loading and building will now be done by the ClusterBasedJobCoordinator (JC), it checks for this config to be either injected by the options mentioned above it not generates a unique UUID for the app. Compatibility, Deprecation, and Migration Plan ... Rejected Alternatives ... Introduce a new API into the ClusterResoruceManager which is supposed to be implemented by the implementation layer
Code Block |
|
|
public abstract class ClusterResourceManager {
public abstract String getJobDeploymentId();
}
|
Pros: Cons: ... |