On Thu, Sep 27, 2007 at 04:07:21PM -0700, David Rees wrote: > On 9/27/07, Dan Pritts <[EMAIL PROTECTED]> wrote: > > So I've been of the opinion (not backed up by experimental data) that > > a concatenation (what linux md driver calls LINEAR; similar effects can > > be realized with LVM) of two RAID1's would be better for BackupPC than > > a RAID10. > > > > My rationale for this is that in RAID10, the disks are generally > > seeking to the same spot, unless you have a write that doesn't span > > across multiple raid stripes. This certainly happens, but i suspect > > most writes span multiple stripes. > > > > i guess this really depends on the RAID stripe size, bigger would be better. > > Looking at my average file size on one of my backuppc servers, it > appears to be about 50KB. With a typical stripe size being 64KB, that > would seem to indicate that your average write will fit on one stripe, > so that may hurt your argument.
I'm not sure why i wrote that; the thing that i typically think about with backuppc is all seeking required due to the extensive use of hard links, and i'm trying to minimize that. Certainly all the hard links that backuppc does are pretty much guaranteed to be tiny writes, and if i'm right that is a huge portion of the I/O load. What follows is pretty much stream-of-conciousness and I don't have the time to edit it; i've spent too much time on this already. Sorry for that but perhaps you'll find it interesting. on the other hand.... what ARE the odds of your 50KB file fitting in a 64KB stripe? It would have to *start* within the first 14KB of the stripe; so it happens roughly 25% the time if this start location is random. The block size of your filesystem probably comes into play here, but with a typical 4KB block size there is enough distribution in that 64KB stripe that it probably is close to random. (without thinking too hard) So only a quarter of those 50KB files would fit in a single stripe. Hmm. how many disk ops are required for a single file write? Hmm, off the top of my head & after reading some about good old ext2: - directory modification to create the file, and journal write of the directory mod (how many actual journal writes?) - modify the inode to note blocks in use, owner, timestamp, etc. (for a big file, you might have indirect blocks also) - modify used-block data structure - actual write of the data blocks; assume contiguous - modify superblock's block-free count (a separate operation from used-block data structure in ext2; not sure about other filesystems) Little seeking is required for the inode, block structure, & data blocks. But i bet this group is likely to span multiple stripes. If you're lucky the directory is close by & fits into this category. These are nearby, related writes; slower than a single write but probably very fast on command-queuing disks. (not sure how much slower on non-command-queuing PATA disks, but perhaps significantly) This group of writes, while close together, is pretty much guaranteed to be spanning multiple stripes. So let's consider it a big single write. the superblock free block count & journal writes are probably pretty small. Those probably each fit within a stripe. *but* they involve probably a significant seek from the data area. So let's consider these two small writes. So an average file *probably* involves a big single write and two small single writes. In the case of a hard link to an existing file, you have two small writes, one to the directory and one to the inode to update the link count. These are likely nowhere particularly near one another; so likely half the time in RAID10 they go to different disks. In a concatenation, on a reasonably full filesystem, i bet this is similar. I'm sure that there is something inaccurate in what I wrote above. So at the end of this what i come up with is that as you say below, it's awfully complicated, with lots of big and lots of small writes, and neither of us has probably considered all of the implications. > Additionally, if we look at the big picture where we are writing out a > bunch of files, these are pretty much guaranteed to be scattered all > over the disk with your typical filesystem. Definitely, on a full disk. In which case concatenation is as good as RAID10. > Even a fresh filesystem > will scatter new files over the disk to help avoid fragmentation > issues should you decide to grow the file later. Didn't know that. Is it typically truly randomly/evenly spaced? Or will it tend to start out at the front and work its way to the back, leaving spaces in the middle? Any quick articles you can suggest (not that i don't believe you, just interested in more info)? > Now throw in the number of backups you are doing and you end up with > too many variables to consider before assuming that a linear array > will outperform a striped array. So, you don't know either :) > For random reads all over the disk, the performance should be similar > for small files but large file reads should be up to twice as fast. > Throw in multiple readers and the difference will narrow. with a single reader/writer, you're right. I was assuming that there are multiple processes doing I/O and i'm willing to trade raw byte throughput for IOPs. > > > Stay away from RAID5 unless you have a good > > > controller with a battery backed cache. > > > > Even then, performance won't be great, especially on random small writes > > (look up the "RAID5 write hole" and "read-modify-write" to understand why). > > But wait, I thought you said that the average write under backuppc > load would be larger than a stripe? So which is it? ;-) well... 1 Regardless of how many small disk writes there are, we can agree there are a significant number. ANY small write will be significantly slow with RAID5, because the read-modify-write cycle requires two disk operations and a compute operation, vs. a single disk operation, so up to twice as much time. You either have to wait for a full rotation (~8.3 ms on a 7200rpm drive), or wait for the drive to seek back to that spot if you're queueing commands (avg seek typically 4.5ms, although my naive guess is that it's shorter in practice on a busy disk with command queuing). Plus, as we think about it, pretty much EVERY filesystem write includes a small write. Here are the scenarios: everything fits within a single stripe, in which case it's a partial-stripe write. everything doesn't, with mid-stripe begins and ends and possibly full stripes in the middle, in which case there are two partial-stripe writes. very occasionally you might somehow end up with exactly a full stripe write. very occasionally you might end up with a full-stripe boundary or two on a multi-stripe write. So, LOTS of partial-stripe writes. In fact, if i'm thinking correctly, nearly everything up to 2x the stripe size involves one or two partial-stripe writes. And remember that the specified stripe size is per-disk, so a 64K stripe size on a 6-disk RAID5 gives you 320K size of what I've called a "full stripe" 2 you are guaranteed with RAID5 that your disk heads HAVE to move in unison for ALL writes; there is no way that they'll be able to seek to different parts of the disk and do independent small writes. So the benefits you and I are discussing for small random I/Os are lost. So I think i'm pretty much on track with this one. Not saying it won't work for someone's particular application; it certainly does give you a lot more disk space for your money, at the expense of IOPS. This has been informative thinking about it, and i'd love to hear your thoughts on the rest that i've written. poking holes especially. tnx danno -- Dan Pritts, System Administrator Internet2 office: +1-734-352-4953 | mobile: +1-734-834-7224 Dive into the next wave of innovation: Register for the Internet2 Member Meeting http://events.internet2.edu/2007/fall-mm/ ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/