Re: Mogile Deployment Layout: More Hosts or More Disks.

dormando Mon, 17 Sep 2007 23:58:32 -0700

Hmm :) Maybe my next documentation spree should be a mogilefs FAQ :)

Main question is, do we do more hosts per disks, or more disks per hosts.


I think the tradeoff here is pretty easy to spot:

As you spread out hosts:

- More local cache. mogstored relies on an OS's object cache to speed uphot files, which you mention the CDN should take care of that...

- More bandwidth to the devices.

- Lessens the impact of losing a host (you should have enough mogilefshosts/devices that losing any one or two is something you don't have tocare about!).

- More CPU, I guess. It's rare but possible to load up mogstored on CPU.

As you add more devices:

- Fewer hosts to manage

- Losing an individual disk in a machine shouldn't hurt anything. In myown setup I never bothered replacing dead disks in a host with multipledrives. Just marked them as dead and got more hd's on the next server order.Since you're somewhat more likely to lose a device than a whole host,this isn't so bad.


You have to keep in mind:

- How full are your devices actually going to get before they become tooactive to hold more files? 750G drives are nice, but usually I can'teven fill a 250G drive before it gets hosed with IO.- The impact of losing a whole host with many 750G drives with many(millions of?) files. It could take a long time for the reaper andreplicators to deal with this as they work in small batches of files.Then again, it won't matter as much as you grow (and especially if youcan quickly deal with dead hosts).

So on a really busy service, I'd have tons of 64-bit hosts with extraRAM. On something with more streaming involved, you have to understandyour dataset well to understand which way to go. Think about the averagesize/access type of your files, as well as how often they're added orreplaced in the system.

Just remember to think of spindles more than disk size. Unless yourdataset is very idle you won't end up filling the disk, and the moredevices you have the more you can parallelize your batch operations :)

As a side note, any real reason not to run the trackers on the storage
nodes?

I did it. Worked okay. Most of my storage nodes didn't have trackers,but some did. The only issue is the trackers can get CPU heavy, whichcould interact with other things on your box.

also, anyone have any pros cons on running mysql master/save
with InnoDB on DRBD versus running lets say mysql cluster?

MySQL Cluster's probably not the greatest fit for the mogilefs database.The dataset can be relatively small, but I don't think it's quite smallenough. Although honestly I only say that because I have limitedexperience with cluster. My mogilefs DBs have been happy if they haveenough RAM for InnoDB to properly cache things...

DRBD should work okay. I've also done master:master withauto_increment_offset, but that might scare the bejesus out of somefolks on the list. I like being able to optimize my tables though :)


-Dormando

Re: Mogile Deployment Layout: More Hosts or More Disks.

Reply via email to