On Thu, Sep 27, 2007 at 04:07:21PM -0700, David Rees wrote:
> On 9/27/07, Dan Pritts <[EMAIL PROTECTED]> wrote:
> > So I've been of the opinion (not backed up by experimental data) that
> > a concatenation (what linux md driver calls LINEAR; similar effects can
> > be realized with LVM) of two RAID1's would be better for BackupPC than
> > a RAID10.
> >
> > My rationale for this is that in RAID10, the disks are generally
> > seeking to the same spot, unless you have a write that doesn't span
> > across multiple raid stripes.  This certainly happens, but i suspect
> > most writes span multiple stripes.
> >
> > i guess this really depends on the RAID stripe size, bigger would be better.
> 
> Looking at my average file size on one of my backuppc servers, it
> appears to be about 50KB. With a typical stripe size being 64KB, that
> would seem to indicate that your average write will fit on one stripe,
> so that may hurt your argument.

I'm not sure why i wrote that; the thing that i typically think about with
backuppc is all seeking required due to the extensive use of hard links,
and i'm trying to minimize that.  

Certainly all the hard links that backuppc does are pretty much
guaranteed to be tiny writes, and if i'm right that is a huge portion
of the I/O load.


What follows is pretty much stream-of-conciousness and I don't have the
time to edit it; i've spent too much time on this already.  Sorry for 
that but perhaps you'll find it interesting.

on the other hand....

what ARE the odds of your 50KB file fitting in a 64KB stripe?  It would
have to *start* within the first 14KB of the stripe; so it happens
roughly 25% the time if this start location is random.  

The block size of your filesystem probably comes into play here, but
with a typical 4KB block size there is enough distribution in that 64KB
stripe that it probably is close to random.  (without thinking too hard)
So only a quarter of those 50KB files would fit in a single stripe.


Hmm.  how many disk ops are required for a single file write?
Hmm, off the top of my head & after reading some about good old ext2:

 - directory modification to create the file, and journal write of
   the directory mod (how many actual journal writes?)

 - modify the inode to note blocks in use, owner, timestamp, etc. (for
   a big file, you might have indirect blocks also)

 - modify used-block data structure

 - actual write of the data blocks; assume contiguous

 - modify superblock's block-free count (a separate operation from used-block
   data structure in ext2; not sure about other filesystems)


Little seeking is required for the inode, block structure, & data blocks.
But i bet this group is likely to span multiple stripes.  If you're lucky
the directory is close by & fits into this category.  

These are nearby, related writes; slower than a single write but
probably very fast on command-queuing disks.  (not sure how much slower
on non-command-queuing PATA disks, but perhaps significantly)

This group of writes, while close together, is pretty much guaranteed
to be spanning multiple stripes.  So let's consider it a big single write.


the superblock free block count & journal writes are probably pretty
small.  Those probably each fit within a stripe.  *but* they involve
probably a significant seek from the data area.  So let's consider these
two small writes.


So an average file *probably* involves a big single write and two small
single writes.  


In the case of a hard link to an existing file, you have two small writes,
one to the directory and one to the inode to update the link count.
These are likely nowhere particularly near one another; so likely half
the time in RAID10 they go to different disks.  In a concatenation,
on a reasonably full filesystem, i bet this is similar.


I'm sure that there is something inaccurate in what I wrote above.

So at the end of this what i come up with is that as you say below,
it's awfully complicated, with lots of big and lots of small writes,
and neither of us has probably considered all of the implications.


> Additionally, if we look at the big picture where we are writing out a
> bunch of files, these are pretty much guaranteed to be scattered all
> over the disk with your typical filesystem. 

Definitely, on a full disk.  In which case concatenation is as good
as RAID10.

> Even a fresh filesystem
> will scatter new files over the disk to help avoid fragmentation
> issues should you decide to grow the file later.

Didn't know that.  Is it typically truly randomly/evenly spaced?  Or will
it tend to start out at the front and work its way to the back, leaving
spaces in the middle?  Any quick articles you can suggest (not that i
don't believe you, just interested in more info)?

> Now throw in the number of backups you are doing and you end up with
> too many variables to consider before assuming that a linear array
> will outperform a striped array.


So, you don't know either :)

> For random reads all over the disk, the performance should be similar
> for small files but large file reads should be up to twice as fast.
> Throw in multiple readers and the difference will narrow.

with a single reader/writer, you're right.  I was assuming that there are
multiple processes doing I/O and i'm willing to trade raw byte throughput
for IOPs.  

> > > Stay away from RAID5 unless you have a good
> > > controller with a battery backed cache.
> >
> > Even then, performance won't be great, especially on random small writes
> > (look up the "RAID5 write hole" and "read-modify-write" to understand why).
> 
> But wait, I thought you said that the average write under backuppc
> load would be larger than a stripe? So which is it? ;-)


well...

1 Regardless of how many small disk writes there are, we can agree there are
  a significant number.  ANY small write will be significantly slow with
  RAID5, because the read-modify-write cycle requires two disk operations
  and a compute operation, vs. a single disk operation, so up to twice
  as much time.

  You either have to wait for a full rotation (~8.3 ms on a 7200rpm
  drive), or wait for the drive to seek back to that spot if you're
  queueing commands (avg seek typically 4.5ms, although my naive guess
  is that it's shorter in practice on a busy disk with command queuing).

  Plus, as we think about it, pretty much EVERY filesystem write includes
  a small write.  Here are the scenarios:

      everything fits within a single stripe, in which case it's a 
      partial-stripe write.
 
      everything doesn't, with mid-stripe begins and ends and possibly
      full stripes in the middle, in which case there are two partial-stripe
      writes.

      very occasionally you might somehow end up with exactly a full stripe
      write.

      very occasionally you might end up with a full-stripe boundary or
      two on a multi-stripe write.

  So, LOTS of partial-stripe writes.  In fact, if i'm thinking correctly,
  nearly everything up to 2x the stripe size involves one or two
  partial-stripe writes.  And remember that the specified stripe size
  is per-disk, so a 64K stripe size on a 6-disk RAID5 gives you 320K
  size of what I've called a "full stripe"



2 you are guaranteed with RAID5 that your disk heads HAVE to move in
  unison for ALL writes; there is no way that they'll be able to seek
  to different parts of the disk and do independent small writes.  So 
  the benefits you and I are discussing for small random I/Os are lost.

So  I think i'm pretty much on track with this one.  Not saying it won't
work for someone's particular application; it certainly does give you a lot
more disk space for your money, at the expense of IOPS.


This has been informative thinking about it, and i'd love to hear your
thoughts on the rest that i've written.  poking holes especially.

tnx

danno
--
Dan Pritts, System Administrator
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224

Dive into the next wave of innovation:
Register for the Internet2 Member Meeting
http://events.internet2.edu/2007/fall-mm/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to