OT: validation on horizontal scalability approach

David Boyes Tue, 06 Apr 2004 05:20:12 -0700

Not directly on topic, but interesting validation of the horizontal
scalability approach to scaling infrastructure.

From: Suresh Ramasubramanian <suresh at hserus.net>
Subject: Interesting speculation on the tech behind gmail

http://blog.topix.net/archives/000016.html

April 04, 2004

The Secret Source of Google's Power

Much is being written about Gmail, Google's new free webmail system.
There's something deeper to learn about Google from this product than the
initial reaction to the product features, however. Ignore for a moment the
observations about Google leapfrogging their competitors with more user
value and a new feature or two. Or Google diversifying away from search
into other applications; they've been doing that for a while. Or the
privacy red herring.

No, the story is about seemingly incremental features that are actually
massively expensive for others to match, and the platform that Google is
building which makes it cheaper and easier for them to develop and run
web-scale applications than anyone else.

I've written before about Google's snippet service, which required that
they store the entire web in RAM. All so they could generate a slightly
better page excerpt than other search engines.

Google has taken the last 10 years of systems software research out of
university labs, and built their own proprietary, production quality
system. What is this platform that Google is building? It's a distributed
computing platform that can manage web-scale datasets on 100,000 node
server clusters. It includes a petabyte, distributed, fault tolerant
filesystem, distributed RPC code, probably network shared memory and
process migration. And a datacenter management system which lets a handful
of ops engineers effectively run 100,000 servers. Any of these projects
could be the sole focus of a startup.

Speculation: Gmail's Architecture and Economics

Let's make some guesses about how one might build a Gmail.

Hotmail has 60 million users. Gmail's design should be comparable, and
should scale to 100 million users. It will only have to support a couple of
million in the first year though.

The most obvious challenge is the storage. You can't lose people's email,
and you don't want to ever be down, so data has to be replicated. RAID is
no good; when a disk fails, a human needs to replace the bad disk, or there
is risk of data loss if more disks fail. One imagines the old ENIAC
technician running up and down the isles of Google's data center with a
shopping cart full of spare disk drives instead of vacuum tubes. RAID also
requires more expensive hardware -- at least the hot swap drive trays. And
RAID doesn't handle high availability at the server level anyway.

No. Google has 100,000 servers. [nytimes] If a server/disk dies, they leave
it dead in the rack, to be reclaimed/replaced later. Hardware failures need
to be instantly routed around by software.

Google has built their own distributed, fault-tolerant, petabyte
filesystem, the Google Filesystem. This is ideal for the job. Say GFS
replicates user email in three places; if a disk or a server dies, GFS can
automatically make a new copy from one of the remaining two. Compress the
email for a 3:1 storage win, then store user's email in three locations,
and their raw storage need is approximately equivalent to the user's mail
size.

The Gmail servers wouldn't be top-heavy with lots of disk. They need the
CPU for indexing and page view serving anyway. No fancy RAID card or
hot-swap trays, just 1-2 disks per 1U server.

It's straightforward to spreadsheet out the economics of the service,
taking into account average storage per user, cost of the servers, and
monetization per user per year. Google apparently puts the operational cost
of storage at $2 per gigabyte. My napkin math comes up with numbers in the
same ballpark. I would assume the yearly monetized value of a webmail user
to be in the $1-10 range.

Cheap Hardware

Here's an anecdote to illustrate how far Google's cultural approach to
hardware cost is different from the norm, and what it means as a component
of their competitive advantage.

In a previous job I specified 40 moderately-priced servers to run a new
internet search site we were developing. The ops team overrode me; they
wanted 6 more expensive servers, since they said it would be easier to
manage 6 machines than 40.

What this does is raise the cost of a CPU second. We had engineers that
could imagine algorithms that would give marginally better search results,
but if the algorithm was 10 times slower than the current code, ops would
have to add 10X the number of machines to the datacenter. If you've already
got $20 million invested in a modest collection of Suns, going 10X to run
some fancier code is not an option.

Google has 100,000 servers.

Any sane ops person would rather go with a fancy $5000 server than a bare
$500 motherboard plus disks sitting exposed on a tray. But that's a 10X
difference to the cost of a CPU cycle. And this frees up the algorithm
designers to invent better stuff.

Without cheap CPU cycles, the coders won't even consider algorithms that
the Google guys are deploying. They're just too expensive to run.

Google doesn't deploy bare motherboards on exposed trays anymore; they're
on at least the fourth iteration of their cheap hardware platform. Google
now has an institutional competence building and maintaining servers that
cost a lot less than the servers everyone else is using. And they do it
with fewer people.

Think of the little internal factory they must have to deploy servers, and
the level of automation needed to run that many boxes. Either network boot
or a production line to pre-install disk images. Servers that
self-configure on boot to determine their network config and load the
latest rev of the software they'll be running. Normal datacenter ops
practices don't scale to what Google has.

What are all those OS Researchers doing at Google?

Rob Pike has gone to Google. Yes, that Rob Pike -- the OS researcher, the
member of the original Unix team from Bell Labs. This guy isn't just some
labs hood ornament; he writes code, lots of it. Big chunks of whole new
operating systems like Plan 9.

Look at the depth of the research background of the Google employees in OS,
networking, and distributed systems. Compiler Optimization. Thread
migration. Distributed shared memory.

I'm a sucker for cool OS research. Browsing papers from Google employees
about distributed systems, thread migration, network shared memory, GFS,
makes me feel like a kid in Tomorrowland wondering when we're going to
Mars. Wouldn't it be great, as an engineer, to have production versions of
all this great research.

Google engineers do!

Competitive Advantage

Google is a company that has built a single very large, custom computer.
It's running their own cluster operating system. They make their big
computer even bigger and faster each month, while lowering the cost of CPU
cycles. It's looking more like a general purpose platform than a cluster
optimized for a single application.

While competitors are targeting the individual applications Google has
deployed, Google is building a massive, general purpose computing platform
for web-scale programming.

This computer is running the world's top search engine, a social networking
service, a shopping price comparison engine, a new email service, and a
local search/yellow pages engine. What will they do next with the world's
biggest computer and most advanced operating system?

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

OT: validation on horizontal scalability approach

Reply via email to