|
Thought this might be of interest for FWers. Wired: 14.10: October 2006 The
Information Factories
The desktop is
dead. Welcome to the Internet cloud, where massive facilities across the globe
will store all the data you'll ever use. George Gilder on
the dawning of the petabyte age.
The drive up Interstate 84, through the verdant amphitheatrical
sweep of the Columbia River Gorge to the quaint Oregon town of The Dalles,
seems a trek into an alluring American past. You pass ancient basalt bluffs
riven by luminous waterfalls, glimpsed through a filigree of Douglas firs. You
see signs leading to museums of native Americana full of feathery and leathery
tribal relics. There are farms and fisheries, vineyards arrayed on hillsides,
eagles and ospreys riding the winds. On the horizon, just a half hour's drive
away, stands the radiant, snowcapped peak of Mount Hood, site of 11 glaciers,
source of half a dozen rivers, and home of four-season skiing. "I could
live here," I say to myself with a backward glance down the highway toward
urban Portland, a sylvan dream of the billboarded corridor that connects
Silicon Valley and San Francisco. Then, as the road comes to an end, the gray ruin of an
abandoned aluminum plant rises from a barren hillside. Its gothic gantries and
cavernous smelters stand empty and forlorn, a poignant warning of the
evanescence of industrial power. But industry has returned to The Dalles, albeit industry
with a decidedly postindustrial flavor. For it's here that Google has chosen to
build its new 30‑acre campus, the base for a server farm of unprecedented
proportion. Although the evergreen mazes, mountain majesties, and always-on
skiing surely play a role, two amenities in particular make this the perfect
site for a next-gen data center. One is a fiber-optic hub linked to Harbour
Pointe, Washington, the coastal landing base of PC-1, a fiber-optic artery
built to handle 640 Gbps that connects Asia to the US. A glassy extension cord
snakes through all the town's major buildings, tapping into the greater
Internet though NoaNet, a node of the experimental Internet2. The other
attraction is The Dalles Dam and its 1.8‑gigawatt power station. The
half-mile-long dam is a crucial source of cheap electrical power – once
essential to aluminum smelting, now a strategic resource in the next phase in
the digital revolution. Indeed, Google and other Silicon Valley titans are
looking to the Columbia River to supply ceaseless cycles of electricity at
about a fifth of what they would cost in the San Francisco Bay Area. Why? To
feed the ravenous appetite of a new breed of computer. Moore's law has a corollary that bears the name of Gordon
Bell, the legendary engineer behind Digital Equipment's VAX line of advanced
computers and now a principal researcher at Microsoft. According to Bell's law,
every decade a new class of computer emerges from a hundredfold drop in the
price of processing power. As we approach a billionth of a cent per byte of
storage, and pennies per gigabit per second of bandwidth, what kind of machine
labors to be born? How will we feed it? How will it be tamed? And how soon will it, in its inevitable turn, become a
dinosaur? One characteristic of this new machine is clear. It arises
from a world measured in the prefix giga, but its operating
environment is the petascale. We're all petaphiles now, plugged into a world of
petabytes, petaops, petaflops. Mouthing the prefix peta (signifying
numbers of the magnitude 10 to the 15th power, a million billion) and the Latin
verb petere (to search), we are doubly petacentric in our
peregrinations through the hypertrophic network cloud. Just last century – you remember it well, across the chasm
of the crash – the PC was king. The mainframe was deposed and deceased. The
desktop was the data center. Larry Page and Sergey Brin were nonprofit googoos
babbling about searching their 150-gigabyte index of the Internet. When I
wanted to electrify crowds with my uncanny sense of futurity, I would talk
terascale (10 to the 12th power), describing a Web with an unimaginably
enormous total of 15 terabytes of content. Yawn. Today Google rules a total database of hundreds of
petabytes, swelled every 24 hours by terabytes of Gmails, MySpace pages, and
dancing-doggy videos – a relentless march of daily deltas, each larger than the
whole Web of a decade ago. To make sense of it all, Page and Brin – with
Microsoft, Yahoo, and Barry "QVC" Diller's Ask.com hot on their heels
– are frantically taking the computer-on-a-chip and multiplying it, in
massively parallel arrays, into a computer-on-a-planet. The data centers these companies are building began as exercises
in making the planet's ever-growing data pile searchable. Now, turbocharged
with billions in Madison Avenue mad money for targeted advertisements, they're
morphing into general-purpose computing platforms, vastly more powerful than
any built before. All those PCs are still there, but they have less and less to
do, as Google and the others take on more and more of the duties once delegated
to the CPU. Optical networks, which move data over vast distances without
degradation, allow computing to migrate to wherever power is cheapest. Thus,
the new computing architecture scales across Earth's surface. Ironically, this
emerging architecture is interlinked by the very technology that was supposed
to be Big Computing's downfall: the Internet. In the PC era, the winners were companies that dominated the
microcosm of the silicon chip. The new age of petacomputing will be ruled by
the masters of the remote data center – those who optimally manage processing
power, electricity, bandwidth, storage, and location. They will leverage the
Net to provide not only search, but also the panoply of applications formerly
housed on the desktop. For the moment, at least, the dawning era favors scale
in hardware rather than software applications, and centralized operations management
rather than operating systems at the network's edge. The burden of playing
catch-up in this new game may be what prompted Bill Gates to hand over
technical leadership at Microsoft to Craig Mundie, a supercomputer expert, and
Ray Ozzie, who made his name in network-based enterprise software with Lotus
and Groove Networks. Having clambered well up the petascale slope, Google has a
privileged view of the future it is building – a perspective it's
understandably reticent to share. Proud of their front end of public search and
advertising algorithms, the G-men hide their hardware coup behind an aw-shucks,
bought-it-at-Fry's facade. They resist the notion that their advantage springs
chiefly from mastering the intricate dynamics of a newly recentralized
computing architecture. This modesty may be disingenuous, of course, but amid
the perpetual onrush of technological innovation, it may well be the soul of
wisdom. After all, the advantage might turn out to be short-lived. Back in 1993, in a midnight email to me from his
office at Sun Microsystems, CTO Eric Schmidt envisioned the future: "When
the network becomes as fast as the processor, the computer hollows out and
spreads across the network." His then-employer publicized this notion in a
compact phrase: The network is the computer. But Sun's hardware honchos failed
to absorb Schmidt's CEO-in-the-making punch line. In which direction would the
profits from that transformation flow? "Not to the companies making the
fastest processors or best operating systems," he prophesied, "but to
the companies with the best networks and the best search and sort
algorithms." Schmidt wasn't just talking. He left Sun and, after a stint
as CEO of Novell, joined Google, where he found himself engulfed by the future
he had predicted. While competitors like Excite, Inktomi, and Yahoo were
building out their networks with SPARCstations and IBM mainframes, Google
designed and manufactured its own servers from commodity components made by
Intel and Seagate. In a 2005 technical article, operations chief Urs Hölzle
explained why. The price of high-end processors "goes up nonlinearly with
performance," he observed. Connecting innumerable cheap processors in
parallel offered at least a theoretical chance for a scalable system, in which
bang for the buck didn't erode as the system grew. Today, Schmidt's insight has been vindicated, and he's often
seen on Google's Mountain View, California, campus wearing his comp-sci PhD's
goofy dimpled grin. The smile has grown toothier since he announced the plant
in The Dalles, a manifestation of what he trumpets as "some of the best
computer science ever performed." When it's finished, the project will
spread tens of thousands of servers across a few giant structures. By building
its own infrastructure rather than relying on commercial data centers, Schmidt
told analysts in May, Google gets "tremendous competitive advantage." The facility in The Dalles is only the latest and most
advanced of about two dozen Google data centers, which stretch from Silicon
Valley to Dublin. All told, it's a staggering collection of hardware, whose
constituent servers number 450,000, according to the lowest estimate. The extended Googleplex comprises an estimated 200 petabytes of hard disk storage – enough
to copy the Net's entire sprawling cornucopia dozens of times – and four
petabytes of RAM. To handle the current load of 100 million queries a day, its collective input-output
bandwidth must be in the neighborhood of 3 petabits per second. Of course, these numbers are educated guesses. One of the
unstated rules of the new arms race is that all information is strategic. Even
the once-voluble Chairman Eric now hides behind PR walls. I had to battle
hoards of polite but steadfast flacks to reach him, but he finally replied to
my queries with a cordial email. Cloud computing, he confirmed, has indeed
succeeded the old high-performance staples: mainframes and client-server, both
of which require local-area networks. This is very much last year's news.
"In this architecture, the data is mostly resident on servers 'somewhere
on the Internet' and the application runs on both the 'cloud servers' and the
user's browser. When you use Google Gmail, Maps, Yahoo's services, many of
eBay's services, you are using this architecture." He added: "The
consequence of this 'architectural shift' is the return of massive data
centers." This change is as momentous as the industrial-age shift
from craft production to mass manufacture, from individual workers in separate shops
turning out finished products step by step to massive factories that break up
production into thousands of parts and perform them simultaneously. No single
computer could update millions of auctions in real time, as eBay does, and no
one machine could track thousands of stock portfolios made up of offerings on
all the world's exchanges, as Yahoo does. And those are, at most, terascale
tasks. Page and Brin understood that with clever software, scores of thousands
of cheap computers working in parallel could perform petascale tasks – like
searching everything Yahoo, eBay, Amazon.com, and anyone else could shovel onto
the Net. Google appears to have attained one of the holy grails of computer
science: a scalable massively parallel architecture that can readily accommodate
diverse software. Google's core activity remains Web search. Having built a
petascale search machine, though, the question naturally arose: What else could
it do? Google's answer: just about anything. Thus the company's expanding
portfolio of Web services: delivering ads (AdSense, AdWords), maps (Google
Maps), videos (Google Video), scheduling (Google Calendar), transactions
(Google Checkout), email (Gmail), and productivity software (Writely). The
other heavyweights have followed suit. Google's success stems from more than foresight, ingenuity,
and chutzpah. In every era, the winning companies are those that waste what is
abundant – as signalled by precipitously declining prices – in order to save
what is scarce. Google
has been profligate with the surfeits of data storage and backbone bandwidth.
Conversely, it has been parsimonious with that most precious of resources,
users' patience. The recent explosion of hard disk storage capacity makes
Moore's law look like a cockroach race. In 1991, a 100-megabyte drive cost
$500, and a 50-megahertz Intel 486 processor cost about the same. In 2006, $500
buys a 750-gigabyte drive or a 3-gigahertz processor. Over 15 years, that's an
advance of 7,500 times for the hard drive and 60 times for the processor. By
this crude metric, the cost-effectiveness of hard drives grew 125 times faster
than that of processors. But the miraculous advance of disk storage concealed a
problem: The larger and denser the individual disks, the longer it takes to
scan them for information. "The little arm reading the disks can't move
fast enough to handle the onrush of seeks," explains Josh Coates, a
32-year-old storage entrepreneur who founded Berkeley Data Systems. "The
whole world stops." The solution is to deploy huge amounts of random access
memory. By the byte, RAM is some 100 times more costly than disk storage.
Engineers normally conserve it obsessively, using all kinds of tricks to fool
processors into treating disk drives as though they were RAM. But Google understands that the most
precious resource is not money but time. Search users, it turns out, are sorely
impatient. Research shows that they're satisfied with results delivered within
a twentieth of a second. RAM can be accessed some 10,000 times faster than
disks. So, measured by access time, RAM is 100 times cheaper than disk storage. But it's not enough to reach users quickly. Google needs to
reach them wherever they are. This requires access to the Net backbone, the
long-haul fiber-optic lines that encircle the globe. In the last decade, the
speed of backbone traffic has accelerated from 45 Mbps to roughly a terabit per
second. That's a rise of more than 20,000 times. Google interconnects its
hundreds of thousands of processors with gigabit Ethernet lines. The expense of
placing gigantic data centers near major fiber-optic nodes is well worth the
expense. Wasting what is abundant to conserve what is scarce, the G-men have
become the supreme entrepreneurs of the new millennium. However, past
performance does not guarantee future returns. As large as the current Google
database is, even bigger shocks are coming. An avalanche of digital video
measured in exabytes (10 to the 18th power, or 1,000 petabytes) is hurtling
down from the mountainsides of panicked Big Media and bubbling up from the
YouTubian depths. The massively parallel, prodigally wasteful petascale
computer has its work cut out for it. Continued at http://wired.com/wired/archive/14.10/cloudware.html |
_______________________________________________ Futurework mailing list [email protected] http://fes.uwaterloo.ca/mailman/listinfo/futurework
