Regarding scalability and consistency requirements. Yes and no. The clustering based on the fact that all the nodes in the same cluster are aware of each other. And the design requires that the list of available nodes is the same on each and every node with normal evolution of the data. The actual service distribution can occur on top of this feature without compromizing the capacity if all the nodes follow exactly the same strategy.
Example: Let's assume we want to create some distributed cache that would occupy 10% of our nodes, management - 1 node, web service - 90% (-1) nodes. Let's assume that all the nodes get exactly the same list of available ones. If the node itself is not in the list it means failure and node does not participate in the distribution (this can happen during some network failures as communication can work one way only). So, the node that finds itself on the top of the list assumes that it would run the management service, the nodes, belonging to the first 10% (except the first one) will start the cache, probably communicating own data to each other directly, and the rest will start the normal web service. The weak point is that the central service must be capable of serving all the nodes (probably hundreds) but this does not seem to be a big issue. The protocol for the central service is not too complex. It must handle cases of new node registration, node removal/expiration, errors related to various network problems, and the propagation of node list changes (that could be naturally combined with the reachability response). Regarding the homogenios configurations. The possible conflict can be resolved if you split the whole service into two separately deployable components. One component is responsible for maintaining the list of nodes for the second component. Then, if they both deployed at the same time, then you have an accurately homogennious environment. If the application node has only one of them, then the environment supposes the dedicated central service (which could be a cluster by itself or just a single node with external failover). In real life for large deployments the homogenious configuration is quite an abstraction. For example, it takes around a year to develop new telco product and start selling it. During the product lifetime the hardware vendors may discontinue the available platforms quite a few times (Intel, for example, releases new CPU model twice every year and discontinues old ones at the same time). So the real system will end up in evironment where it has to run on servers with different CPU speed, different amount of memory and, probably, different architecture. As the customers really want to protect their investment, you need to have a very good reason for upgrading the hardware/unifying the architecture. -valeri
