[Futurework] The New Information Factories Pt 1

Karen Watters Cole Wed, 18 Oct 2006 20:29:42 -0700

Thought this might be of interest for FWers.

Wired: 14.10: October 2006

The Information Factories

The desktop is dead. Welcome to the Internet cloud, where massive facilities across the globe will store all the data you'll ever use.

George Gilder on the dawning of the petabyte age.

The drive up Interstate 84, through the verdant amphitheatrical sweep of the Columbia River Gorge to the quaint Oregon town of The Dalles, seems a trek into an alluring American past. You pass ancient basalt bluffs riven by luminous waterfalls, glimpsed through a filigree of Douglas firs. You see signs leading to museums of native Americana full of feathery and leathery tribal relics. There are farms and fisheries, vineyards arrayed on hillsides, eagles and ospreys riding the winds. On the horizon, just a half hour's drive away, stands the radiant, snowcapped peak of Mount Hood, site of 11 glaciers, source of half a dozen rivers, and home of four-season skiing. "I could live here," I say to myself with a backward glance down the highway toward urban Portland, a sylvan dream of the billboarded corridor that connects Silicon Valley and San Francisco.

Then, as the road comes to an end, the gray ruin of an abandoned aluminum plant rises from a barren hillside. Its gothic gantries and cavernous smelters stand empty and forlorn, a poignant warning of the evanescence of industrial power.

But industry has returned to The Dalles, albeit industry with a decidedly postindustrial flavor. For it's here that Google has chosen to build its new 30‑acre campus, the base for a server farm of unprecedented proportion.

Although the evergreen mazes, mountain majesties, and always-on skiing surely play a role, two amenities in particular make this the perfect site for a next-gen data center. One is a fiber-optic hub linked to Harbour Pointe, Washington, the coastal landing base of PC-1, a fiber-optic artery built to handle 640 Gbps that connects Asia to the US. A glassy extension cord snakes through all the town's major buildings, tapping into the greater Internet though NoaNet, a node of the experimental Internet2. The other attraction is The Dalles Dam and its 1.8‑gigawatt power station. The half-mile-long dam is a crucial source of cheap electrical power – once essential to aluminum smelting, now a strategic resource in the next phase in the digital revolution. Indeed, Google and other Silicon Valley titans are looking to the Columbia River to supply ceaseless cycles of electricity at about a fifth of what they would cost in the San Francisco Bay Area. Why? To feed the ravenous appetite of a new breed of computer.

Moore's law has a corollary that bears the name of Gordon Bell, the legendary engineer behind Digital Equipment's VAX line of advanced computers and now a principal researcher at Microsoft. According to Bell's law, every decade a new class of computer emerges from a hundredfold drop in the price of processing power. As we approach a billionth of a cent per byte of storage, and pennies per gigabit per second of bandwidth, what kind of machine labors to be born?

How will we feed it?

How will it be tamed?

And how soon will it, in its inevitable turn, become a dinosaur?

One characteristic of this new machine is clear. It arises from a world measured in the prefix giga, but its operating environment is the petascale. We're all petaphiles now, plugged into a world of petabytes, petaops, petaflops. Mouthing the prefix peta (signifying numbers of the magnitude 10 to the 15th power, a million billion) and the Latin verb petere (to search), we are doubly petacentric in our peregrinations through the hypertrophic network cloud.

Just last century – you remember it well, across the chasm of the crash – the PC was king. The mainframe was deposed and deceased. The desktop was the data center. Larry Page and Sergey Brin were nonprofit googoos babbling about searching their 150-gigabyte index of the Internet. When I wanted to electrify crowds with my uncanny sense of futurity, I would talk terascale (10 to the 12th power), describing a Web with an unimaginably enormous total of 15 terabytes of content.

Yawn. Today Google rules a total database of hundreds of petabytes, swelled every 24 hours by terabytes of Gmails, MySpace pages, and dancing-doggy videos – a relentless march of daily deltas, each larger than the whole Web of a decade ago. To make sense of it all, Page and Brin – with Microsoft, Yahoo, and Barry "QVC" Diller's Ask.com hot on their heels – are frantically taking the computer-on-a-chip and multiplying it, in massively parallel arrays, into a computer-on-a-planet.

The data centers these companies are building began as exercises in making the planet's ever-growing data pile searchable. Now, turbocharged with billions in Madison Avenue mad money for targeted advertisements, they're morphing into general-purpose computing platforms, vastly more powerful than any built before. All those PCs are still there, but they have less and less to do, as Google and the others take on more and more of the duties once delegated to the CPU. Optical networks, which move data over vast distances without degradation, allow computing to migrate to wherever power is cheapest. Thus, the new computing architecture scales across Earth's surface. Ironically, this emerging architecture is interlinked by the very technology that was supposed to be Big Computing's downfall: the Internet.

In the PC era, the winners were companies that dominated the microcosm of the silicon chip. The new age of petacomputing will be ruled by the masters of the remote data center – those who optimally manage processing power, electricity, bandwidth, storage, and location. They will leverage the Net to provide not only search, but also the panoply of applications formerly housed on the desktop. For the moment, at least, the dawning era favors scale in hardware rather than software applications, and centralized operations management rather than operating systems at the network's edge. The burden of playing catch-up in this new game may be what prompted Bill Gates to hand over technical leadership at Microsoft to Craig Mundie, a supercomputer expert, and Ray Ozzie, who made his name in network-based enterprise software with Lotus and Groove Networks.

Having clambered well up the petascale slope, Google has a privileged view of the future it is building – a perspective it's understandably reticent to share. Proud of their front end of public search and advertising algorithms, the G-men hide their hardware coup behind an aw-shucks, bought-it-at-Fry's facade. They resist the notion that their advantage springs chiefly from mastering the intricate dynamics of a newly recentralized computing architecture. This modesty may be disingenuous, of course, but amid the perpetual onrush of technological innovation, it may well be the soul of wisdom. After all, the advantage might turn out to be short-lived.

Back in 1993, in a midnight email to me from his office at Sun Microsystems, CTO Eric Schmidt envisioned the future: "When the network becomes as fast as the processor, the computer hollows out and spreads across the network." His then-employer publicized this notion in a compact phrase: The network is the computer. But Sun's hardware honchos failed to absorb Schmidt's CEO-in-the-making punch line. In which direction would the profits from that transformation flow? "Not to the companies making the fastest processors or best operating systems," he prophesied, "but to the companies with the best networks and the best search and sort algorithms."

Schmidt wasn't just talking. He left Sun and, after a stint as CEO of Novell, joined Google, where he found himself engulfed by the future he had predicted. While competitors like Excite, Inktomi, and Yahoo were building out their networks with SPARCstations and IBM mainframes, Google designed and manufactured its own servers from commodity components made by Intel and Seagate. In a 2005 technical article, operations chief Urs Hölzle explained why. The price of high-end processors "goes up nonlinearly with performance," he observed. Connecting innumerable cheap processors in parallel offered at least a theoretical chance for a scalable system, in which bang for the buck didn't erode as the system grew.

Today, Schmidt's insight has been vindicated, and he's often seen on Google's Mountain View, California, campus wearing his comp-sci PhD's goofy dimpled grin. The smile has grown toothier since he announced the plant in The Dalles, a manifestation of what he trumpets as "some of the best computer science ever performed." When it's finished, the project will spread tens of thousands of servers across a few giant structures. By building its own infrastructure rather than relying on commercial data centers, Schmidt told analysts in May, Google gets "tremendous competitive advantage."

The facility in The Dalles is only the latest and most advanced of about two dozen Google data centers, which stretch from Silicon Valley to Dublin. All told, it's a staggering collection of hardware, whose constituent servers number 450,000, according to the lowest estimate.

The extended Googleplex comprises an estimated 200 petabytes of hard disk storage – enough to copy the Net's entire sprawling cornucopia dozens of times – and four petabytes of RAM. To handle the current load of 100 million queries a day, its collective input-output bandwidth must be in the neighborhood of 3 petabits per second.

Of course, these numbers are educated guesses. One of the unstated rules of the new arms race is that all information is strategic. Even the once-voluble Chairman Eric now hides behind PR walls. I had to battle hoards of polite but steadfast flacks to reach him, but he finally replied to my queries with a cordial email. Cloud computing, he confirmed, has indeed succeeded the old high-performance staples: mainframes and client-server, both of which require local-area networks. This is very much last year's news. "In this architecture, the data is mostly resident on servers 'somewhere on the Internet' and the application runs on both the 'cloud servers' and the user's browser. When you use Google Gmail, Maps, Yahoo's services, many of eBay's services, you are using this architecture." He added: "The consequence of this 'architectural shift' is the return of massive data centers."

This change is as momentous as the industrial-age shift from craft production to mass manufacture, from individual workers in separate shops turning out finished products step by step to massive factories that break up production into thousands of parts and perform them simultaneously. No single computer could update millions of auctions in real time, as eBay does, and no one machine could track thousands of stock portfolios made up of offerings on all the world's exchanges, as Yahoo does. And those are, at most, terascale tasks. Page and Brin understood that with clever software, scores of thousands of cheap computers working in parallel could perform petascale tasks – like searching everything Yahoo, eBay, Amazon.com, and anyone else could shovel onto the Net. Google appears to have attained one of the holy grails of computer science: a scalable massively parallel architecture that can readily accommodate diverse software.

Google's core activity remains Web search. Having built a petascale search machine, though, the question naturally arose: What else could it do? Google's answer: just about anything. Thus the company's expanding portfolio of Web services: delivering ads (AdSense, AdWords), maps (Google Maps), videos (Google Video), scheduling (Google Calendar), transactions (Google Checkout), email (Gmail), and productivity software (Writely). The other heavyweights have followed suit.

Google's success stems from more than foresight, ingenuity, and chutzpah. In every era, the winning companies are those that waste what is abundant – as signalled by precipitously declining prices – in order to save what is scarce. Google has been profligate with the surfeits of data storage and backbone bandwidth. Conversely, it has been parsimonious with that most precious of resources, users' patience.

The recent explosion of hard disk storage capacity makes Moore's law look like a cockroach race. In 1991, a 100-megabyte drive cost $500, and a 50-megahertz Intel 486 processor cost about the same. In 2006, $500 buys a 750-gigabyte drive or a 3-gigahertz processor. Over 15 years, that's an advance of 7,500 times for the hard drive and 60 times for the processor. By this crude metric, the cost-effectiveness of hard drives grew 125 times faster than that of processors.

But the miraculous advance of disk storage concealed a problem: The larger and denser the individual disks, the longer it takes to scan them for information. "The little arm reading the disks can't move fast enough to handle the onrush of seeks," explains Josh Coates, a 32-year-old storage entrepreneur who founded Berkeley Data Systems. "The whole world stops."

The solution is to deploy huge amounts of random access memory. By the byte, RAM is some 100 times more costly than disk storage. Engineers normally conserve it obsessively, using all kinds of tricks to fool processors into treating disk drives as though they were RAM. But Google understands that the most precious resource is not money but time. Search users, it turns out, are sorely impatient. Research shows that they're satisfied with results delivered within a twentieth of a second. RAM can be accessed some 10,000 times faster than disks. So, measured by access time, RAM is 100 times cheaper than disk storage.

But it's not enough to reach users quickly. Google needs to reach them wherever they are. This requires access to the Net backbone, the long-haul fiber-optic lines that encircle the globe. In the last decade, the speed of backbone traffic has accelerated from 45 Mbps to roughly a terabit per second. That's a rise of more than 20,000 times. Google interconnects its hundreds of thousands of processors with gigabit Ethernet lines. The expense of placing gigantic data centers near major fiber-optic nodes is well worth the expense.

Wasting what is abundant to conserve what is scarce, the G-men have become the supreme entrepreneurs of the new millennium. However, past performance does not guarantee future returns. As large as the current Google database is, even bigger shocks are coming. An avalanche of digital video measured in exabytes (10 to the 18th power, or 1,000 petabytes) is hurtling down from the mountainsides of panicked Big Media and bubbling up from the YouTubian depths. The massively parallel, prodigally wasteful petascale computer has its work cut out for it.

Continued at http://wired.com/wired/archive/14.10/cloudware.html

_______________________________________________
Futurework mailing list
[email protected]
http://fes.uwaterloo.ca/mailman/listinfo/futurework

[Futurework] The New Information Factories Pt 1

The Information Factories

Reply via email to