Tzafrir Cohen said:"Yes, but if you bring clustering into the game, then suddenly cheaper hardwares can become more relieble.
The author also forgets that the guests need patching as well. Having all of them as guests on a mainframe, or as separate machines in a farm is not all that different in that respect, because remote-management tools are good enough for the basic tasks. And you can still load the new software to one unused computer in the farm, start it, and then swap-out the bad computer you want to retire. Requires some more hardware, but the hardware is much cheaper, anyway. A bigger problem is that there are simply more machines to patch. This is the basic issue: machines are not patched because their admins (or admin-replacements) don't bother. Admining a system is not a task that requires a special admin (that should be aware of patching)." I would point out that clustering makes hardware more available, not more reliable. The things actually fail more often because there is more to fail, but the user sees the cluster as available during the failures. There are some problems with availability clustering as it is usually done, causing the cluster to have lower (often significantly lower availability) than it is designed to have. The first problem is that as the utilization rises on a cluster the redundancy in the cluster drops, unfortunately so does the reliability of the components. Systems whose load is growing start losing availability from day one as the workload grows. Most folks add hardware when they need it for capacity, not when they need it for availability. Next when running with one or more redundant servers down, the probability of failure of the remaining servers increases due to stress brougth on by higher utilization. Because of this n+1 availability is not usually a good enough design point. The second problem is that utilization must be kept quite low if to maintain redundancy unless the utilization grows linearly with load. If the through put "tails off" or "saturates" as the utilization goes up, the utilization required to maintain redundancy is lower than intuitively expected. Most people don't have a clue about how their workload saturates on a cluster, let alone at what utilization they loose the redundancy required to get the availability they desire. Furthermore, n+2 availability is met a lower utilizations than n+1 availability. The third problem is that failover time is often long enough to count as a measurable outage, particularly when a data base or shared state is involved. As far as I know the "IBM Parallel Sysplex" with data sharing and redundant coupling facilities is the only system that can avoid a measurable outage on a failover. In today's multiple tiered systems the availability of the tiers is mulitplied, so that the availability of the whole solution is somewhat less than the availability of the weakest tier. Finally, the Linux on z Solution has a an advantage on patching in that multiple virtual machines can share the unpatched and patched versions. You only have to update the shared image once and then roll the boot/ipl of the VMs to point to the new version. In addition the virtual machines' redundant capacity can be handled by letting the remaining machines have the resources of the VM that was rolled out which it then are reclaimed on restart. The hardware utilization stays relatively constant because workloads saturate zSeries machines less than other machines. This is because saturation comes from non processor bottlenecks and the zSeries machines are more robust in supply of other resources per CPU configured. As a result a virtual cluster will see higher redundancy at any utilization and therefore will be more available than the equivalent cluster, EVEN IF THE CLUSTER HARDWARE were AS RELIABLE AS zSERIES, which it is not. Joe Temple [EMAIL PROTECTED] 845-435-6301 295/6301 cell 914-706-5211 home 845-338-8794
