On Thu, 15 Jul 2010 11:00:03 -0600 Bond Masuda <[email protected]> wrote:
> Thanks for the suggestions. Yes, we are aware of those other > parameters, but we now know the bottleneck is in the MD RAID-1 layer. > This is RHEL 5.5 w/ latest updated kernel (don't have the version with > me right now) > > We've tried all schedulers, a variety of read ahead buffers, etc. The > only thing that has allowed us to break the 200MB/s seq. write limit > is when we get rid of the MD RAID-1 layer. > > Even if we don't use the file system (XFS in this case), if we build > the MD RAID-1 with a missing half, and then add the 2nd half to allow > it to re-sync, the fastest the re-sync will go (with all else pretty > much idle) is about 200MB/s. So, this is MD RAID-1 layer doing it's > own block copying with no LVM2 or XFS or anything else involved. > > -Bond > > On Thu, 2010-07-15 at 10:34 -0500, Paul M. Dyer wrote: > > > > We're seeing a performance bottleneck of about 200MBytes/sec > > sequential writes when testing with iozone. We were expecting with > > 7x effective spindles on the RAID-10, to get about ~350MBytes/sec > > sustained writes for sequential access. > > > > After trying out several combinations of things, we found that if we > > remove the Linux MD software RAID layer, and just LVM2 on top of > > the /dev/sdc (the vdisk as presented by the PERC 6/E RAID-10), we > > get about 340MBytes/sec sequential writes. If we put XFS directly > > on top of /dev/sdc1, we get about the same 340MBytes/sec. So, we > > can get our anticipated performance of about 350MB/s only when we > > don't use the MD RAID-1. > > > > Since both MD1000s are connected via separate PERC 6/E, we didn't > > think the MD RAID-1 would cause >40% performance loss... > > > > We even tried to degrade the MD RAID-1 and see if writing only to > > one of the mirrors would improve performance. It did NOT.. .still > > 200MB/s. It almost seems like Linux MD layer has a performance cap > > at around 200MB/s. Call me crazy, but I'm guessing you never let it finish syncing in the first place. When syncing, it caps its own throughput at some arbitrary amount the code deems sensible. Sounds like it never un-capped, suggesting you never let it complete the sync process before benching it. Further evidence that both numbers are so remarkably the same also suggests that. Depending on the size of the dataset, it could take a while to finish. I could swear that I recall there being some parameter which allows you to create a mirror but skip the sync step. If you do that, then basically you're swearing on the storage bible that both volumes were zeroed before the mirror creation, and the first thing you did after creating the volume was mkfs. Personally I've never tried it, and searching the documentation, I can't find it now, so perhaps that was just a bad dream I had. Cheers, a _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
