Hi Jochen, 

I used to be in charge of deploying and managing international points of 
presence and content distribution complexes for AOL Time Warner (in Brazil, 
Western Europe, Japan, China, Australia, et al.), which were basically smaller 
versions of the colossal data centers that AOL maintained in Northern Virginia. 
Before Google came along, those NoVA datacenters were the largest in the world. 
Based on that experience I can confirm your intuitions about the diversity of 
fault tolerance and adaptation mechanisms in such systems, at least at every 
level of "resolution" in the physical/operational level. To give just a few 
examples, 

Device level:
-- With very few exceptions (e.g., except very high speed routing systems), all 
software processes are built on/using open source components.
        --> provides for greater flexibility, integration, conservation of 
programming know-how 
-- Outside of the same few exceptions, all software runs on top of very 
inexpensive, compact commodity hardware.
        --> enables boxes to be repurposed and or replaced relatively quickly 
and easily

Every device embodies some self-diagnostic capability.

Functional level:
-- All infrastructure is organized into functional "clusters," with each 
cluster encompassing a relatively complete, self-contained set of elements 
sufficient to support a fixed quantity of users/service delivery requirements, 
the size of which is defined based on environmental consideration (e.g., the 
cost of staff/travel time to deploy, anticipated size/footprint and power 
availability in commercial data centers, etc.).
        --> "clusterization" simplifies and standardizes growth, change 
management, and other adaptive requirements
        --> also "canalizes" architectural risks and requirements, and 
simplifies remote management
        --> cluster-based process encapsulation also provides for reduced 
vulnerability to "foreign infections," as both lateral and hierarchical 
interactions between clusters are highly constrained and closely monitored

Every cluster also embodies some independent, higher level self-diagnostic and 
self-correction capabilities.

Global/geographic level:
-- The physical/topological organization of functional clusters may be highly 
concentrated or widely distributed, or more often embody a mix of strategies, 
to better match the diverse environmental opportunities and constraints that 
are characteristic of different geographic-economic-legal "target markets."
        --> Topologically proximate distributed clusters are designed to fall 
back onto each other, so if one fails or has to be temporarily decommissioned 
for maintenance, the service that it provides is sustained without interruption 
by another cluster located elsewhere (e.g., a nearby city or country).

In the end, however, the design/ers that define the form or ontology that 
clusterization takes, and that selects the environmental parameters to optimize 
for, and the overall goals of the system itself, tend to be highly localized 
and concentrated. As a result, sensory feedback along the external edges of the 
more remote parts of the system (e.g., suggestions from international partners, 
recommendations from the international operations manager) are often drowned 
out by sensory inputs that are received in the immediate proximity of at that 
decision making core (e.g., budgeting constraints and priorities, demands from 
investors, "strategic vision" of senior leadership, etc.).

It was an amazing job while it lasted ;-)

Regards, 

Tom Vest

On Feb 26, 2010, at 3:39 AM, Jochen Fromm wrote:

> I mean the former, their computer systems and esp. their huge
> data centers ( here is a map of all Google data centers: http://bit.ly/3i4UDw 
> ).
> If you have so many computers, you must have some form of monitoring
> system, and ideally you have also some form of self-configuríng and
> self-healing system which repairs and optimizes itself. As you said, both
> companies surely have redundancy features in their networks to achieve
> fault-tolerance and robustness.
> 
> If I remember it correctly, some of your early papers were about
> resourceful systems and fault tolerance, are they available somewhere?
> 
> -J.
> 
> ----- Original Message ----- From: Russ Abbott
> To: The Friday Morning Applied Complexity Coffee Group
> Sent: Thursday, February 25, 2010 11:42 PM
> Subject: Re: [FRIAM] Hello, FRIAM
> 
> 
> Jochen,
> 
> You said that "Google or Amazon ... have self-healing, self-monitoring and
> self-configuring systems." Would you elaborate on what you mean. Do you mean
> their computer systems or Google and Amazon as corporations?  If the former,
> I'm sure they have redundancy features in their computing networks -- just
> as the Internet itself has. What else are you thinking of?
> 
> 
> -- Russ Abbott
> _____________________________________________
> Professor, Computer Science
> California State University, Los Angeles
> Cell phone: 310-621-3805
> o Check out my blog at http://russabbott.blogspot.com/
> 
> 
> 
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org
> 
> 
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> lectures, archives, unsubscribe, maps at http://www.friam.org


============================================================
FRIAM Applied Complexity Group listserv
Meets Fridays 9a-11:30 at cafe at St. John's College
lectures, archives, unsubscribe, maps at http://www.friam.org

Reply via email to