On Wed, Jul 3, 2013 at 2:59 PM, Maglinger, Paul <[email protected]> wrote:
> They may think the customer doesn’t notice, but I’m willing to bet the
> average customer calls their ISP when there’s a problem with NetFlix and by
> the time they get done rebooting their cable modem and router the service is
> back up.  “Average customer” then either blames their equipment or their
> ISP.

  It helps to understand the environment Netflix is running.  They're
running almost everything in Amazon's cloud service, where they
basically can't depend on any given node or cluster not disappearing
without warning.  So, rather than try to find a cloud host that's
bullet-proof *and* affordable, they design their software architecture
to withstand the inevitable cloud failures.  Chaos Monkey just helps
to keep the developers from assuming the host won't fail.  But this
just addresses problems with a host platform failure.  There's plenty
more problem domains that it doesn't impact.  In particular, software
that does the wrong thing (rather than just becomes unavailable) can
and does still cause outages.

-- Ben


Reply via email to