Hi Jochen,
I used to be in charge of deploying and managing international points of
presence and content distribution complexes for AOL Time Warner (in Brazil,
Western Europe, Japan, China, Australia, et al.), which were basically smaller
versions of the colossal data centers that AOL maintained in Northern Virginia.
Before Google came along, those NoVA datacenters were the largest in the world.
Based on that experience I can confirm your intuitions about the diversity of
fault tolerance and adaptation mechanisms in such systems, at least at every
level of "resolution" in the physical/operational level. To give just a few
examples,
Device level:
-- With very few exceptions (e.g., except very high speed routing systems), all
software processes are built on/using open source components.
--> provides for greater flexibility, integration, conservation of
programming know-how
-- Outside of the same few exceptions, all software runs on top of very
inexpensive, compact commodity hardware.
--> enables boxes to be repurposed and or replaced relatively quickly
and easily
Every device embodies some self-diagnostic capability.
Functional level:
-- All infrastructure is organized into functional "clusters," with each
cluster encompassing a relatively complete, self-contained set of elements
sufficient to support a fixed quantity of users/service delivery requirements,
the size of which is defined based on environmental consideration (e.g., the
cost of staff/travel time to deploy, anticipated size/footprint and power
availability in commercial data centers, etc.).
--> "clusterization" simplifies and standardizes growth, change
management, and other adaptive requirements
--> also "canalizes" architectural risks and requirements, and
simplifies remote management
--> cluster-based process encapsulation also provides for reduced
vulnerability to "foreign infections," as both lateral and hierarchical
interactions between clusters are highly constrained and closely monitored
Every cluster also embodies some independent, higher level self-diagnostic and
self-correction capabilities.
Global/geographic level:
-- The physical/topological organization of functional clusters may be highly
concentrated or widely distributed, or more often embody a mix of strategies,
to better match the diverse environmental opportunities and constraints that
are characteristic of different geographic-economic-legal "target markets."
--> Topologically proximate distributed clusters are designed to fall
back onto each other, so if one fails or has to be temporarily decommissioned
for maintenance, the service that it provides is sustained without interruption
by another cluster located elsewhere (e.g., a nearby city or country).
In the end, however, the design/ers that define the form or ontology that
clusterization takes, and that selects the environmental parameters to optimize
for, and the overall goals of the system itself, tend to be highly localized
and concentrated. As a result, sensory feedback along the external edges of the
more remote parts of the system (e.g., suggestions from international partners,
recommendations from the international operations manager) are often drowned
out by sensory inputs that are received in the immediate proximity of at that
decision making core (e.g., budgeting constraints and priorities, demands from
investors, "strategic vision" of senior leadership, etc.).
It was an amazing job while it lasted ;-)
Regards,
Tom Vest
On Feb 26, 2010, at 3:39 AM, Jochen Fromm wrote:
> I mean the former, their computer systems and esp. their huge
> data centers ( here is a map of all Google data centers: http://bit.ly/3i4UDw
> ).
> If you have so many computers, you must have some form of monitoring
> system, and ideally you have also some form of self-configurÃng and
> self-healing system which repairs and optimizes itself. As you said, both
> companies surely have redundancy features in their networks to achieve
> fault-tolerance and robustness.
>
> If I remember it correctly, some of your early papers were about
> resourceful systems and fault tolerance, are they available somewhere?
>
> -J.
>
> ----- Original Message ----- From: Russ Abbott
> To: The Friday Morning Applied Complexity Coffee Group
> Sent: Thursday, February 25, 2010 11:42 PM
> Subject: Re: [FRIAM] Hello, FRIAM
>
>
> Jochen,
>
> You said that "Google or Amazon ... have self-healing, self-monitoring and
> self-configuring systems." Would you elaborate on what you mean. Do you mean
> their computer systems or Google and Amazon as corporations? If the former,
> I'm sure they have redundancy features in their computing networks -- just
> as the Internet itself has. What else are you thinking of?
>
>
> -- Russ Abbott
> _____________________________________________
> Professor, Computer Science
> California State University, Los Angeles
> Cell phone: 310-621-3805
> o Check out my blog at http://russabbott.blogspot.com/
>
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org