| ... JIRA: SAMZA-TBD Released: Problem Motivation Proposed Changes Public Interfaces Implementation and Test Plan Originally, Samza could be run in distributed mode only, with help of Yarn, which is used for cluster management. Yarn was responsible for allocating resources for each physical process, and coordinating between them. Recently, Samza was released in embedded mode which enables you to use it as a library. Currently, the coordination services in the embedded version of Samza have been written using Zookeeper. The dependency on Zookeeper increases our customers’ reliability on the infrastructure, and does not help with modularity. Also, Zookeeper is tedious to maintain and does not help in componentization. The goal of this proposal is to write the same coordination primitives using services provided in Microsoft Azure, in order to to make the the coordination service pluggable. Motivation With the 0.13.0 release, Samza introduced a flexible deployment model which enables you to run Samza in containerized environments, with resource managers other than YARN, or in the cloud with the proper coordination primitives. It also enables you to run Samza as a library, within your application. Introducing a coordination service in Azure will help identify issues with the current Job Coordinator design, and validate the functionalities that Samza Embedded claims it provides. If incorporated with the EventHub connector for Brooklyn, it will give us an end-to-end system running in Azure, giving more motivation to teams in Microsoft that will be able to easily deploy Samza jobs in their existing systems. Additionally, we get all the advantages of moving to the cloud. Proposed Changes
-
Implement the AzureJobCoordinator on top of current JobCoordinator. This will include the implementation of the storage component in Azure Storage and the notification component in (Operations Manager, Application Insights, Azure Monitor, Notification Hubs) as they are still not pluggable in the current API.
-
Implement the Latch and Leader functionality with Lease Blobs in Azure. These are pluggable components.
-
Implementing the checkpointing mechanism with Azure Table Storage. ??
-
Integrate all of this with the EventHubSystemProducer and EventHubSystemConsumer.
Public Interfaces The following interfaces will be implemented for Azure:
-
JobCoordinator
-
Latch
-
LeaderElection
Implementation and Test Plan
-
Implement the AzureJobCoordinator, LeaderElection and Latch functionality
-
Add metrics to monitor the new features
-
Implement necessary unit tests and integration tests for the added functionalities
Compatibility, Deprecation, and Migration Plan The changes made in this proposal will be backward compatible. The client just needs to change the config file and assign the job.coordinator.factory variable to org.apache.samza.azure.AzureJobCoordinatorFactory. Rejected Alternatives |