vjagadish1989 commented on issue #938: SAMZA-1531: Support run.id in standalone for batch processing. URL: https://github.com/apache/samza/pull/938#issuecomment-485597657 @bharathkk @lakshmi-manasa-g A quick summary of next steps: The current implementation is way too complex for the problem at hand. Part of the complexity stems from (i) Added interfaces to coordinationUtils that are too specific to this feature (ii) combining coordination and state management as a part of the same interface eg: this leads to pathologies like `DistributedLockWithState` Instead, we should compose new features using these building-blocks: 1. State-management: interface across k-v read-writes across processors; This should be `MetadataStore` 2. Cluster-membership: interface for queries to membership of the cluster + getting notified participants change. 3. Coordination: interface for doing an atomic operation across processors ie., distributed-lock With the above goal, we will: - Rip apart state-management from the existing `DistributedLockWithState` interface and make it a `DistributedLock`; - Move all state-management logic to use the `MetadataStore` - Add support to the `DistributedLock` interface to provide notifications during disconnects/lock revocations. - Use `DistributedLock` consistently wherever we require distributed atomic operations in the code-base. - Nuke the `DistributedReadWriteLock` interface and its corresponding implementation; In its current form, its semantics are hard to reason with and there appears to be no use-case that can't be solved with the `DistributedLock` - Add a new interface to coordinationUtils to query cluster-membership + getting notified for ongoing changes - Combine these building-blocks and generate run.id at each processor by performing the following actions within the critical-section: * Register as a participant and use the cluster-membership interface to query membership * If it's the first processor in the fleet, generate and write run.id to the metadata-store * If not, read run.id from metadata-store - As a simpler first solution, any lock revocation due to a Zk-error can shutdown the process. We can explore fancier ways of handling re-registration later
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
