Hey, as you may already heard, I used a RPC sync service which I have wrote on my own. It works, but it may not be as good as Zookeeper. My idea: We can make a "AbstractBSPPeer" class which has following methods: abstract enterBarrier(); abstract leaveBarrier(); abstract getAllPeerNames();
These are obviously things that belong to the our specific synchronization daemon. Now we could extend an ZooKeeperBSPPeer which implements the ZooKeeper way of barrier sync and a RPC one. Or to push it even further, take on Edwards idea of a common synchronization service which abstracts the use of ZooKeeper or an RPC service. My goal of the RPC service is to keep simplicity in our code and built a overhead-less service which provides additional features, e.G. deregistering a task from a barrier. It would be great if we can benchmark them both to get a gist of what is the best in terms of performance and reliability. So I would be +1 for Edwards idea. Maybe you can clarify this a bit @Edward. [1] Edwards idea would help us to share common code between YARN and normal infrastructure. [1] my thoughts: we need some kind of factory which launches a specific sync daemon, based on a given configuration. It would be great if you can share your opinion :) Thanks!
