kashani wrote:
A. Khattri wrote:
On Wed, 19 Apr 2006, kashani wrote:
I'm not sold on the Google approach.
Assuming someone was to build nine data servers we're talking
roughly
$3k per server (dual CPU, 4GB ram, raid 5 sata) or $30k with shipping
and tax.
Actually Google use the cheapest hardware they can find, buy in bulk, and
they assume stuff will fail so they plan accordingly. I very much doubt
they spend $3K per server...
Anyone trying to build the same with a purchase of less than 100 servers
is not going to spend much less.
2 x 2GB RAM = $1000
2 x CPU = $500 or so
Raid Card = $300
Drives = $100 each
1u chassis/MB/etc = $500
IIRC they drop the drives and shove everything into RAM. Which is fine
when you have a limited data set and enough machines to shove it into
RAM. Originally Google did have a limited data set. It was only after
the infrastructure reached a critical size that they began Google mail
and other large storage things. And had fours years to work out the
operational kinks.
I think it unlikely that someone with a single storage set has enough
money or time to pay for a few Phds to write a custom filesystem, a few
hundred servers (10TB/4GB), and the datacenter monkey necessary to
replace gear constantly... oddly Google has both of these in spades. And
if you read the whitepaper the smallest Google data cluster is nineteen
servers, aka $40-60k for schleps like us, aka the cost of a SAN that
that uses less power and burns less switch ports.
This infatuation with the Google stuff that very few people (ie none of
us) have the in house infrastructure to handle or the available cash is
useless. Unless someone has actually built their own mini Google and
wants to tell us all about it with nice numbers like total cost, source
code, transactions per second, cost per GB of user data, throughput, and
other data points.
kashani
Sorry, I'll have to pipe up about that... There's at least one guy here
using mogilefs, which is basically a google approach :) I'm using it as
well.
There are a few nice things about distributing your files around:
- You don't necessarily need to dedicate machines to it. I have a
hundred and change diskless boot webservers. I add two harddrives to a
box, fire up the mogstored daemon, and put it back into the webserver pool.
- You're overspeccing. Why would each box need 4G of RAM? Whatever
NAS/SAN you buy will *not* have nearly that much cache. If you're going
to live without, live without :) A single dualcore chip or less could
work too.
- Something like MogileFS uses node-level redundancy (NAID? Someone
bothered giving it a name...), so there's no point in buying RAID cards.
Use onboard SATA/PATA or the cheapest cards you can buy that will give
decent throughput ($50 or less instead of $300).
- If it floats your boat, go ahead and get those 3U supermicro cases
with 16 drive bays. Just use a couple cheap 4+ port SATA hot swap
capable controllers instead of 2x$500+ battery backed RAID controllers.
Get a bunch of them as slimmed down as possible. Hell, just fuse some
steel together, tool an mb/PSU to it, and stack a ton of drives in front
of some huge fans. Save 2/3rds of the case cost.
My case might not be enough like yours, but I can easily add terabytes
of space for *just* the cost of the drives (and the relatively small
power addage per box I add drives to). There are a few other services
too though... Mogile needs a central DB (which you'd have two of,
right?) and tracker services. Again, cheap is fine. Maybe you have a DB
with some free resources... I just can't imagine someone selling me a
NAS/SAN that's this scalable and this cheap.
have fun,
-Dormando
--
[email protected] mailing list