cameronlee314 opened a new pull request #1529: URL: https://github.com/apache/samza/pull/1529
Feature: Adding a job coordinator which does not do resource management. For Samza on YARN, the `ClusterBasedJobCoordinator` does resource management. However, for Samza on Kubernetes, it is not necessary to have a job coordinator which does resource management, because Kubernetes controllers can take care of resource management. Changes: 1. Added a `StaticResourceJobCoordinator` which handles responsibilities like job model calculation, communicating job model to workers, monitoring input streams which may cause the job model to change, and startpoint fanout. 2. Added new abstraction layer `CoordinatorCommunication` for communication between job coordinator and workers. Before, the only communication option was an HTTP endpoint. The new abstraction layer allows us to start decoupling the coordination from the HTTP endpoint. This PR doesn't expose an option to plug in a custom communication layer yet, but there is an interface to start working off of. Testing: 1. Added unit tests 2. Tested a Samza job using this new coordinator in Kubernetes API changes (all changes are backwards compatible): 1. Set the config `job.coordinator.use.static.resource.job.coordinator` to `true` in order to use the new coordinator. 2. Set the config `job.coordinator.restart.signal.factory` to define how to restart the Samza job when an input stream changes which will change the job model. This plug-in is dependent on where the Samza job is running (e.g. Kubernetes). Currently, there is only a no-op implementation of this restart signal. Note: This PR reuses components like `JobModelHelper`, `JobCoordinatorMetadataManager`, `StreamPartitionCountMonitorFactory`, and `StartpointManager` across `ClusterBasedJobCoordinator` and `StaticResourceJobCoordinator`. I considered consolidating `ClusterBasedJobCoordinator` and `StaticResourceJobCoordinator` even further to share code which encapsulates usage of multiple of the above components, but it's not yet clear how to fit the new and old flows together cleanly. There might be further divergence between the coordinators as we iterate on on `StaticResourceJobCoordinator`, so I felt it would be easier to leave them more decoupled for now. Also, keeping them decoupled reduces risk that a change will impact the existing `ClusterBasedJobCoordinator`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
