Re: Interesting perspective

Joseph Temple Wed, 19 Mar 2003 04:37:13 -0800

John Summerfield writes "If the application stays up, ir's more reliable...
... I'm sure that's actually true in IBM mainframes too.

I read recently a new "disk" drive from IBM, I guess in many respects a
successor to RAID.

A disk failed? Leave it there, swap in a spare.

Zero maintenance because failed components are swapped out of service,
spares swapped in.

Are the individual drives especially reliable? No. Is the storage device
especially reliable? Yes. Does anyone care about the fine distinction?
No."

If the application stays up its more available.  If it were more reliable
you would not have to take any action (ie spend money) to repair the failed
part.  This is where the distinction is, and to the extent that the repairs
cost time and money, people care.  This is what drives the idea of
autonomic computing; the more the systems can repair themselves, the more
availability turns into reliability.  Clusters are a long way from
autonomic today.  I guess my main points were that clusters in and of
themselves do not  make reliability irrelevant,  many clusters have less
availability than we might think at first blush, and they all lose
availability as the load on them grows.  And yes this is true for all
systems, including zSeries.  But zSeries does have advantages that come
from Virtualization, the "balanced" machine structure, the autonomic
features built in and the high reliability of the hardware.

Joe Temple
[EMAIL PROTECTED]
845-435-6301  295/6301   cell 914-706-5211 home 845-338-8794

                      John Summerfield
                      <[EMAIL PROTECTED]        To:       [EMAIL PROTECTED]
                      afe.com.au>                  cc:
                      Sent by: Linux on 390        Subject:  Re: Interesting 
perspective
                      Port
                      <[EMAIL PROTECTED]
                      EDU>

                      03/19/2003 06:34 AM
                      Please respond to
                      Linux on 390 Port

On Tue, 18 Mar 2003, Joseph Temple wrote:

> I would point out that clustering makes hardware more available, not more
> reliable.

If the application stays up, ir's more reliable.

> The things actually fail more often because there is  more to
> fail,

I'm sure that's actually true in IBM mainframes too.

I read recently a new "disk" drive from IBM, I guess in many respects a
successor to RAID.

A disk failed? Leave it there, swap in a spare.

Zero maintenance because failed components are swapped out of service,
spares swapped in.

Are the individual drives especially reliable? No. Is the storage device
especially reliable? Yes. Does anyone care about the fine distinction?
No.

> but the user sees the cluster as available during the failures.
> There  are some problems with availability clustering as it is usually
> done, causing the cluster to have lower (often significantly lower
> availability) than it is designed to have.  The first problem is that as
> the utilization rises on a cluster the redundancy in the cluster drops,
> unfortunately so does the reliability of the components.   Systems whose
> load is growing start losing availability from day one as the workload
> grows.   Most folks add hardware when they need it for capacity, not when
> they need it for availability.  Next when running with one or more
> redundant servers down, the probability of failure of the remaining
servers
> increases due to stress brougth on by higher utilization.   Because of
this
> n+1 availability is not usually a good enough design point.  The second
> problem is that utilization must be kept quite low if to maintain
> redundancy unless the utilization grows linearly with load.  If the
through
> put "tails off" or "saturates" as the utilization goes up, the
utilization
> required to maintain redundancy is lower than intuitively expected.
Most
> people don't have a clue about how their workload saturates on a cluster,
> let alone at what utilization they loose the redundancy required to get
the
> availability they desire.  Furthermore, n+2 availability is met a lower
> utilizations than n+1 availability.   The third problem is that  failover
> time is often long enough to count as a measurable outage, particularly
> when a data base or shared state is involved.  As far as I know the "IBM
> Parallel Sysplex" with data sharing and redundant coupling facilities is
> the only system that can avoid a measurable outage on a failover.  In
> today's multiple tiered systems the availability of the tiers is
> mulitplied, so that the availability of the whole solution is somewhat
less
> than the availability of the weakest tier.
>
> Finally, the Linux on z Solution has a an advantage on patching in that
> multiple virtual machines can share the unpatched and patched versions.
> You only have to update the shared image once and then roll the boot/ipl
of
> the VMs to point to the new version.  In addition the virtual machines'
> redundant capacity can be handled by letting the remaining machines have
> the resources of the VM that was rolled out which it then are  reclaimed
on
> restart.  The hardware utilization stays relatively constant because
> workloads saturate zSeries machines less than other machines.  This is
> because saturation comes from non processor bottlenecks and the zSeries
> machines are more robust in supply of  other resources per CPU
configured.
> As a result  a virtual cluster will see higher redundancy at any
> utilization and therefore will be more available than the equivalent
> cluster, EVEN IF THE CLUSTER HARDWARE were AS RELIABLE AS zSERIES, which
it
> is not.

There are many possible solutions and compromises, depending on need,
budget and the cost of failure.

At one extreme you might want failover to another site, far distant, to
cover against disasters such as floods, earthquakes, fires or even a
lightning strike, and you want no visible outage.

Linux can do that, using software raid (mirroring) across a network.
Whether you want to do it with a monster Z or something more modest, say
a low-end IA32 server depends on budget, the cost of downtime.

I'm sure IBM has some tricks that make the Z do it better, at a price of
course.

We've discussed Google here before: would anyone notice if a few Google
servers went missing for a while? Seems to me, probably not, and
according to some in the discussion that is on low-cost hardware.

--

Cheers
John.

Join the "Linux Support by Small Businesses" list at
http://mail.computerdatasafe.com.au/mailman/listinfo/lssb

Re: Interesting perspective

Reply via email to