On 9/18/07, dormando <[EMAIL PROTECTED]> wrote: > Hmm :) Maybe my next documentation spree should be a mogilefs FAQ :) >
Yeap, that would be awesome :) > > Main question is, do we do more hosts per disks, or more disks per hosts. > > I think the tradeoff here is pretty easy to spot: > > As you spread out hosts: > > - More local cache. mogstored relies on an OS's object cache to speed up > hot files, which you mention the CDN should take care of that... > - More bandwidth to the devices. > - Lessens the impact of losing a host (you should have enough mogilefs > hosts/devices that losing any one or two is something you don't have to > care about!). > - More CPU, I guess. It's rare but possible to load up mogstored on CPU. > > As you add more devices: > > - Fewer hosts to manage > - Losing an individual disk in a machine shouldn't hurt anything. In my > own setup I never bothered replacing dead disks in a host with multiple > drives. Just marked them as dead and got more hd's on the next server order. > Since you're somewhat more likely to lose a device than a whole host, > this isn't so bad. > Agreed. > You have to keep in mind: > > - How full are your devices actually going to get before they become too > active to hold more files? 750G drives are nice, but usually I can't > even fill a 250G drive before it gets hosed with IO. > - The impact of losing a whole host with many 750G drives with many > (millions of?) files. It could take a long time for the reaper and > replicators to deal with this as they work in small batches of files. > Then again, it won't matter as much as you grow (and especially if you > can quickly deal with dead hosts). > > So on a really busy service, I'd have tons of 64-bit hosts with extra > RAM. On something with more streaming involved, you have to understand > your dataset well to understand which way to go. Think about the average > size/access type of your files, as well as how often they're added or > replaced in the system. > > Just remember to think of spindles more than disk size. Unless your > dataset is very idle you won't end up filling the disk, and the more > devices you have the more you can parallelize your batch operations :) > And this is where the issue comes. Right now, the usage patterns of our system will change. We envision this as i said as more of an archival system with light file serving, although this may well change. Our growth in images is about 7GB/day with text, uncompressed, being around 1-2GB. But we do expect this to double every 3-6 months. This of course in addition to our main data load, which will be about 3TB of images, and less than a TB of text ( since MogileFS will be doing compression) > > As a side note, any real reason not to run the trackers on the storage > > nodes? > > I did it. Worked okay. Most of my storage nodes didn't have trackers, > but some did. The only issue is the trackers can get CPU heavy, which > could interact with other things on your box. > Figured that was one of the ways to alleviate this(cpu), by spreading them across, and running as many as possible. > > also, anyone have any pros cons on running mysql master/save > > with InnoDB on DRBD versus running lets say mysql cluster? > > > > MySQL Cluster's probably not the greatest fit for the mogilefs database. > The dataset can be relatively small, but I don't think it's quite small > enough. Although honestly I only say that because I have limited > experience with cluster. My mogilefs DBs have been happy if they have > enough RAM for InnoDB to properly cache things... > > DRBD should work okay. I've also done master:master with > auto_increment_offset, but that might scare the bejesus out of some > folks on the list. I like being able to optimize my tables though :) > The master-master solution sounds actually pretty interesting, and its something i'd probably want to implement if it works well enough. Would there be huge contention issues? do you just create a vip and round robing between the masters? > -Dormando >
