Re: LVM performance (was: Re: RAID5 to RAID6 reshape?)
On Tue, Feb 19, 2008 at 01:52:21PM -0600, Jon Nelson wrote: On Feb 19, 2008 1:41 PM, Oliver Martin [EMAIL PROTECTED] wrote: Janek Kozicki schrieb: $ hdparm -t /dev/md0 /dev/md0: Timing buffered disk reads: 148 MB in 3.01 seconds = 49.13 MB/sec $ hdparm -t /dev/dm-0 /dev/dm-0: Timing buffered disk reads: 116 MB in 3.04 seconds = 38.20 MB/sec I'm getting better performance on a LV than on the underlying MD: # hdparm -t /dev/md0 /dev/md0: Timing buffered disk reads: 408 MB in 3.01 seconds = 135.63 MB/sec # hdparm -t /dev/raid/multimedia /dev/raid/multimedia: Timing buffered disk reads: 434 MB in 3.01 seconds = 144.04 MB/sec # As people are trying to point out in many lists and docs: hdparm is *not* a benchmark tool. So its numbers, while interesting, should not be regarded as a valid comparison. Just my oppinion. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Any inexpensive hardware recommendations for PCI interface cards?
On Fri, Feb 08, 2008 at 08:54:55AM -0500, Justin Piszcz wrote: The promise tx4 pci works great and supports sata/300+ncq/etc $60-$70. Wait, I have used tx4 pci up until ~2.6.22 and it didn't support AFAIK ncq. Are you sure that current driver supports NCQ? I might then revive that card :) thanks, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Any inexpensive hardware recommendations for PCI interface cards?
On Fri, Feb 08, 2008 at 02:24:15PM -0500, Justin Piszcz wrote: On Fri, 8 Feb 2008, Iustin Pop wrote: On Fri, Feb 08, 2008 at 08:54:55AM -0500, Justin Piszcz wrote: The promise tx4 pci works great and supports sata/300+ncq/etc $60-$70. Wait, I have used tx4 pci up until ~2.6.22 and it didn't support AFAIK ncq. Are you sure that current driver supports NCQ? I might then revive that card :) thanks, iustin Whoa nice catch, I meant the Promise 300 TX4 which now retails for $59.99 w/free ship. http://www.newegg.com/Product/Product.aspx?Item=N82E16816102062 :) Actually, I exactly meant Promise 300 TX4 (the board is in my hand: chip says PDC40718). The HW supports NCQ, but the linux sata_promise driver didn't support NCQ when I tested it. Can someone confirm it does today (2.6.24) NCQ? iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: recommendations for stripe/chunk size
On Thu, Feb 07, 2008 at 01:31:16AM +0100, Keld Jørn Simonsen wrote: Anyway, why does a SATA-II drive not deliver something like 300 MB/s? Wait, are you talking about a *single* drive? In that case, it seems you are confusing the interface speed (300MB/s) with the mechanical read speed (80MB/s). If you are asking why is a single drive limited to 80 MB/s, I guess it's a problem of mechanics. Even with NCQ or big readahead settings, ~80-~100 MB/s is the highest I've seen on 7200 RPM drives. And yes, there is no wait until the CPU processes the current data until the drive reads the next data; drives have a builtin read-ahead mechanism. Honestly, I have 10x as many problems with the low random I/O throughput rather than with the (high, IMHO) sequential I/O speed. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One Large md or Many Smaller md for Better Peformance?
On Tue, Jan 22, 2008 at 05:34:14AM -0600, Moshe Yudkowsky wrote: Carlos Carvalho wrote: I use reiser3 and xfs. reiser3 is very good with many small files. A simple test shows interactively perceptible results: removing large files is faster with xfs, removing large directories (ex. the kernel tree) is faster with reiser3. My current main concern about XFS and reiser3 is writebacks. The default mode for ext3 is journal, which in case of power failure is more robust than the writeback modes of XFS, reiser3, or JFS -- or so I'm given to understand. On the other hand, I have a UPS and it should shut down gracefully regardless if there's a power failure. I wonder if I'm being too cautious? I'm not sure what your actual worry is. It's not like XFS loses *commited* data on power failure. It may lose data that was never required to go to disk via fsync()/fdatasync()/sync. If someone is losing data on power failure is the unprotected write cache of the harddrive. If you have properly-behaved applications, then they know when to do an fsync and if XFS returns success on fsync and your linux is properly configured (no write-back caches on drives that are not backed by NVRAM, etc.) then you won't lose data. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One Large md or Many Smaller md for Better Peformance?
On Sun, Jan 20, 2008 at 02:24:46PM -0600, Moshe Yudkowsky wrote: Question: with the same number of physical drives, do I get better performance with one large md-based drive, or do I get better performance if I have several smaller md-based drives? No expert here, but my opinion: - md code works better if it's only one array per physical drive, because it keeps statistics per array (like last accessed sector, etc.) and if you combine two arrays on the same drive these statistics are not exactly true anymore - simply separating 'application work areas' into different filesystems is IMHO enogh, no need to separate the raid arrays too - if you download torrents, fragmentation is a real problem, so use a filesystem that knows how to preallocate space (XFS and maybe ext4; for XFS use xfs_io to set a bigger extend size for where you download) regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help diagnosing bad disk
On Wed, Dec 19, 2007 at 01:18:21PM -0500, Jon Sabo wrote: So I was trying to copy over some Indiana Jones wav files and it wasn't going my way. I noticed that my software raid device showed: /dev/md1 on / type ext3 (rw,errors=remount-ro) Is this saying that it was remounted, read only because it found a problem with the md1 meta device? That's what it looks like it's saying but I can still write to /. FYI, it means that it is currently rw, and if there are errors, it will remount the filesystem readonly (as opposed to panic-ing). regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Sat, Oct 20, 2007 at 10:52:39AM -0400, John Stoffel wrote: Michael Well, I strongly, completely disagree. You described a Michael real-world situation, and that's unfortunate, BUT: for at Michael least raid1, there ARE cases, pretty valid ones, when one Michael NEEDS to mount the filesystem without bringing up raid. Michael Raid1 allows that. Please describe one such case please. Boot from a raid1 array, such that everything - including the partition table itself - is mirrored. iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, Oct 19, 2007 at 02:39:47PM -0400, John Stoffel wrote: And if putting the superblock at the end is problematic, why is it the default? Shouldn't version 1.1 be the default? In my opinion, having the superblock *only* at the end (e.g. the 0.90 format) is the best option. It allows one to mount the disk separately (in case of RAID 1), if the MD superblock is corrupt or you just want to get easily at the raw data. As to the people who complained exactly because of this feature, LVM has two mechanisms to protect from accessing PVs on the raw disks (the ignore raid components option and the filter - I always set filters when using LVM ontop of MD). regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Expose the degraded status of an assembled array through sysfs
On Mon, Sep 10, 2007 at 06:51:14PM +0200, Iustin Pop wrote: The 'degraded' attribute is useful to quickly determine if the array is degraded, instead of parsing 'mdadm -D' output or relying on the other techniques (number of working devices against number of defined devices, etc.). The md code already keeps track of this attribute, so it's useful to export it. Signed-off-by: Iustin Pop [EMAIL PROTECTED] --- Note: I sent this back in January and it people agreed it was a good idea. However, it has not been picked up. So here I resend it again. Ping? Neil, could you spare a few moments to look at this? (and sorry for bothering you) Patch is against 2.6.23-rc5 Thanks, Iustin Pop drivers/md/md.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f883b7e..3e3ad71 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2842,6 +2842,12 @@ sync_max_store(mddev_t *mddev, const char *buf, size_t len) static struct md_sysfs_entry md_sync_max = __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store); +static ssize_t +degraded_show(mddev_t *mddev, char *page) +{ + return sprintf(page, %i\n, mddev-degraded); +} +static struct md_sysfs_entry md_degraded = __ATTR_RO(degraded); static ssize_t sync_speed_show(mddev_t *mddev, char *page) @@ -2985,6 +2991,7 @@ static struct attribute *md_redundancy_attrs[] = { md_suspend_lo.attr, md_suspend_hi.attr, md_bitmap.attr, + md_degraded.attr, NULL, }; static struct attribute_group md_redundancy_group = { -- 1.5.3.1 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Speaking of network disks (was: Re: syncing remote homes.)
On Sat, Sep 22, 2007 at 10:28:44AM -0700, Mr. James W. Laferriere wrote: Hello Bill all , Bill Davidsen [EMAIL PROTECTED] Sat, 22 Sep 2007 09:41:40 -0400 , wrote: My only advice is to try and quantify the data volume and look at nbd vs. iSCSI to provide the mirror if you go that way. You mentioned nbd as a transport for disk to remote disk . My Question is have you OR anyone else tried using drbd(*) as a method to replicate disk data across networks ? I have used it only in local networks, but it works very well. It's much, much better than md + nbd, for example, because it was designed with the network in mind - so it deals gracefully with transient network errors and such. And the current version (8.x) is also more flexible than the previous versions. I'd recommed you give it a try. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MD RAID1 performance very different from non-RAID partition
On Sat, Sep 15, 2007 at 12:28:07AM -0500, Jordan Russell wrote: (Kernel: 2.6.18, x86_64) Is it normal for an MD RAID1 partition with 1 active disk to perform differently from a non-RAID partition? md0 : active raid1 sda2[0] 8193024 blocks [2/1] [U_] I'm building a search engine database onto this partition. All of the source data is cached into memory already (i.e., only writes should be hitting the disk). If I mount the partition as /dev/md0, building the database consistently takes 18 minutes. If I stop /dev/md0 and mount the partition as /dev/sda2, building the database consistently takes 31 minutes. Why the difference? Maybe it's because md doesn't support barriers whereas the disks supports them? In this case some filesystems, for example XFS, will work faster on raid1 because they can't force the flush to disk using barriers. Just a guess... regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MD RAID1 performance very different from non-RAID partition
On Sat, Sep 15, 2007 at 02:18:19PM +0200, Goswin von Brederlow wrote: Shouldn't it be the other way around? With a barrier the filesystem can enforce an order on the data written and can then continue writing data to the cache. More data is queued up for write. Without barriers the filesystem should do a sync at that point and have to wait for the write to fully finish. So less is put into cache. I don't know in general, but XFS will simply not issue any sync at all if the block device doesn't support barriers. It's the syadmin's job to either ensure you have barriers or turn off write cache on disk (see the XFS faq, for example). However, I never saw such behaviour from MD (i.e. claiming the write has completed while the disk underneath is still receiving data to write from Linux) so I'm not sure this is what happens here. In my experience, MD acknowledges a write only when it has been pushed to the drive (write cache enabled or not) and there is no buffer between MD and the drive. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reducing the number of disks a RAID1 expects
On Sun, Sep 09, 2007 at 09:31:54PM -1000, J. David Beutel wrote: [EMAIL PROTECTED] ~]# mdadm --grow /dev/md5 -n2 mdadm: Cannot set device size/shape for /dev/md5: Device or resource busy mdadm - v1.6.0 - 4 June 2004 Linux 2.6.12-1.1381_FC3 #1 Fri Oct 21 03:46:55 EDT 2005 i686 athlon i386 GNU/Linux I'm not sure that such an old kernel supports reshaping an array. The mdadm version should not be a problem, as that message is probably generated by the kernel. I'd recommend trying to boot with a newer kernel, even if only for the duration of the reshape. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Expose the degraded status of an assembled array through sysfs
The 'degraded' attribute is useful to quickly determine if the array is degraded, instead of parsing 'mdadm -D' output or relying on the other techniques (number of working devices against number of defined devices, etc.). The md code already keeps track of this attribute, so it's useful to export it. Signed-off-by: Iustin Pop [EMAIL PROTECTED] --- Note: I sent this back in January and it people agreed it was a good idea. However, it has not been picked up. So here I resend it again. Patch is against 2.6.23-rc5 Thanks, Iustin Pop drivers/md/md.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index f883b7e..3e3ad71 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2842,6 +2842,12 @@ sync_max_store(mddev_t *mddev, const char *buf, size_t len) static struct md_sysfs_entry md_sync_max = __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store); +static ssize_t +degraded_show(mddev_t *mddev, char *page) +{ + return sprintf(page, %i\n, mddev-degraded); +} +static struct md_sysfs_entry md_degraded = __ATTR_RO(degraded); static ssize_t sync_speed_show(mddev_t *mddev, char *page) @@ -2985,6 +2991,7 @@ static struct attribute *md_redundancy_attrs[] = { md_suspend_lo.attr, md_suspend_hi.attr, md_bitmap.attr, + md_degraded.attr, NULL, }; static struct attribute_group md_redundancy_group = { -- 1.5.3.1 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Explain the read-balancing algorithm for RAID1 better in md.4
There are many questions on the mailing list about the RAID1 read performance profile. This patch adds a new paragraph to the RAID1 section in md.4 that details what kind of speed-up one should expect from RAID1. Signed-off-by: Iustin Pop [EMAIL PROTECTED] --- this patch is against the git tree of mdadm. md.4 |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/md.4 b/md.4 index cf423cb..db39aba 100644 --- a/md.4 +++ b/md.4 @@ -168,6 +168,13 @@ All devices in a RAID1 array should be the same size. If they are not, then only the amount of space available on the smallest device is used (any extra space on other devices is wasted). +Note that the read balancing done by the driver does not make the RAID1 +performance profile be the same as for RAID0; a single stream of +sequential input will not be accelerated (e.g. a single dd), but +multiple sequential streams or a random workload will use more than one +spindle. In theory, having an N-disk RAID1 will allow N sequential +threads to read from all disks. + .SS RAID4 A RAID4 array is like a RAID0 array with an extra device for storing -- 1.5.3.1 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Using my Mirror disk to boot up.
On Wed, Aug 29, 2007 at 08:25:59PM -0700, chee wrote: i, This is my Filesystem: Filesystem Size Used Avail Use% Mounted on /dev/md0 9.7G 6.6G 2.7G 72% / none 189M 0 189M 0% /dev/shm /dev/md2 103G 98G 289M 100% /home and this is mirror settings: Personalities : [raid1] md1 : active raid1 hda2[0] hdd2[1] 512000 blocks [2/2] [UU] md2 : active raid1 hdd3[1] hda3[0] 109298112 blocks [2/2] [UU] md0 : active raid1 hdd1[1] hda1[0] 10240128 blocks [2/2] [UU] The problem i am facing is my mirror disk does not seem to boot up when i swap hard disk to test where my mirroring disk is working. The only thing i see was this 'LI' in the monitor and hangs there. The problem is (most likely) that your mirrors only cover the hd[ad][123] partitions, and not the whole disk. Thus, the MBR of hda is not synchronized to hdd. You can do two things here: - fix your lilo.conf to correctly write to both hda and hdd (IIRC, you need the directive raid-extra-boot=mbr or raid-extra-boot=mbr-only, depending on how exactly you install lilo) - change to a partitionable raid array instead of three arrays (one for each partition); that will cover also the mbr of the drive regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] Layering: Use-Case Composers (was: DRBD - what is it, anyways? [compare with e.g. NBD + MD raid])
On Sun, Aug 12, 2007 at 07:03:44PM +0200, Jan Engelhardt wrote: On Aug 12 2007 09:39, [EMAIL PROTECTED] wrote: now, I am not an expert on either option, but three are a couple things that I would question about the DRDB+MD option 1. when the remote machine is down, how does MD deal with it for reads and writes? I suppose it kicks the drive and you'd have to re-add it by hand unless done by a cronjob. From my tests, since NBD doesn't have a timeout option, MD hangs in the write to that mirror indefinitely, somewhat like when dealing with a broken IDE driver/chipset/disk. 2. MD over local drive will alternate reads between mirrors (or so I've been told), doing so over the network is wrong. Certainly. In which case you set write_mostly (or even write_only, not sure of its name) on the raid component that is nbd. 3. when writing, will MD wait for the network I/O to get the data saved on the backup before returning from the syscall? or can it sync the data out lazily Can't answer this one - ask Neil :) MD has the write-mostly/write-behind options - which help in this case but only up to a certain amount. In my experience DRBD wins hands-down over MD+NBD because of MD doesn't know (or handle) a component that never returns from a write, which is quite different from returning with an error. Furthermore, DRBD was designed to handle transient errors in the connection to the peer due to its network-oriented design, whereas MD is mostly designed with local or at least high-reliability disks (where disk can be SAN, SCSI, etc.) and a failure is not normal for MD. Thus the need for manual reconnect in MD case and the automated handling of reconnects in case of DRBD. I'm just a happy user of both MD over local disks and DRBD for networked raid. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Customize the error emails of `mdadm --monitor`
On Wed, Jun 06, 2007 at 01:31:44PM +0200, Peter Rabbitson wrote: Peter Rabbitson wrote: Hi, Is there a way to list the _number_ in addition to the name of a problematic component? The kernel trend to move all block devices into the sdX namespace combined with the dynamic name allocation renders messages like /dev/sdc1 has problems meaningless. It would make remote server support so much easier, by allowing the administrator to label drive trays Component0 Component1 Component2... etc, and be sure that the local tech support person will not pull out the wrong drive from the system. Any takers? Or is it a RTFM question (in which case I certainly overlooked the relevant doc)? If you use udev, have you looked in /dev/disk? I think it solves the problem you need by allowing one to see either the disks by id or by path. Making the reverse map is then trivial (for a reasonable number of disks). regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Customize the error emails of `mdadm --monitor`
On Wed, Jun 06, 2007 at 02:23:31PM +0200, Peter Rabbitson wrote: Iustin Pop wrote: On Wed, Jun 06, 2007 at 01:31:44PM +0200, Peter Rabbitson wrote: Peter Rabbitson wrote: Hi, Is there a way to list the _number_ in addition to the name of a problematic component? The kernel trend to move all block devices into the sdX namespace combined with the dynamic name allocation renders messages like /dev/sdc1 has problems meaningless. It would make remote server support so much easier, by allowing the administrator to label drive trays Component0 Component1 Component2... etc, and be sure that the local tech support person will not pull out the wrong drive from the system. Any takers? Or is it a RTFM question (in which case I certainly overlooked the relevant doc)? If you use udev, have you looked in /dev/disk? I think it solves the problem you need by allowing one to see either the disks by id or by path. Making the reverse map is then trivial (for a reasonable number of disks). This would not work as arrays are assembled by the kernel at boot time, at which point there is no udev or anything else for that matter other than /dev/sdX. And I am pretty sure my OS (debian) does not support udev in initrd as of yet. Ah, I see. But then, sysfs should help (I presume sysfs being a standard kernel filesystem can be mounted in the initrd). I think that most of the information the kernel has for the device is present in sysfs. At least a crude form of path mapping to the real controller is available via the symlink /sys/block/sdN/device. I don't know if it really helps your case. iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Same UUID for every member of all array ?
On Thu, Apr 12, 2007 at 02:57:57PM +0200, Brice Figureau wrote: Now, I don't know why all the UUID are equals (my other machines are not affected). I think at some point either in sarge or in testing between sarge and etch, there was included a version of mdadm which had this bug (all arrays had the same uuid). Yeah, it bit me too a little :) Is there a possibility to hot change the UUID of each array (and change the corresponding superblocks of each member) so that my next boot will work ? did you read the manpage for mdadm (the version in etch)? It has a -U argument to assemble which does what you want. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 does not seem faster
On Mon, Apr 09, 2007 at 06:53:26AM -0400, Justin Piszcz wrote: Using 2 threads made no difference either. It was not until I did 3 simultaneous copies that I saw 110-130MB/s through vmstat 1, until then, it only used one drive, even with two cp's, how come it needs to be three or more? Because, as I understand it, it's an optimisation, not a rule. Quoting from the manpage (md): Once initialised, each device in a RAID1 array contains exactly the same data. Changes are written to all devices in parallel. Data is read from any one device. The driver attempts to distribute read requests across all devices to maximise performance. The key word here is attempts. I looked a while ago over the source code and IIRC it says that it tries to direct a read request to the drive whose head is closest to the requested sector, or if not possible a random drive. To me, this seems a good strategy, which optimises server-type workloads. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 does not seem faster
On Wed, Apr 04, 2007 at 07:11:50PM -0400, Bill Davidsen wrote: You are correct, but I think if an optimization were to be done, some balance between the read time, seek time, and read size could be done. Using more than one drive only makes sense when the read transfer time is significantly longer than the seek time. With an aggressive readahead set for the array that would happen regularly. It's possible, it just takes the time to do it, like many other nice things. Maybe yes, but why optimise the single-reader case? raid1 already can read in parallel from the drives when multiple processes read from the raid1. Optimising the single reader can help in hdparm or other benchmark cases, but in real life I see very often the total throughput of a (two drive) raid1 being around two times the throughput of a single drive. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 does not seem faster
On Thu, Apr 05, 2007 at 04:11:35AM -0400, Justin Piszcz wrote: On Thu, 5 Apr 2007, Iustin Pop wrote: On Wed, Apr 04, 2007 at 07:11:50PM -0400, Bill Davidsen wrote: You are correct, but I think if an optimization were to be done, some balance between the read time, seek time, and read size could be done. Using more than one drive only makes sense when the read transfer time is significantly longer than the seek time. With an aggressive readahead set for the array that would happen regularly. It's possible, it just takes the time to do it, like many other nice things. Maybe yes, but why optimise the single-reader case? raid1 already can read in parallel from the drives when multiple processes read from the raid1. Optimising the single reader can help in hdparm or other benchmark cases, but in real life I see very often the total throughput of a (two drive) raid1 being around two times the throughput of a single drive. regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Really? I have copied a file from a SW RAID1 (5GB) and I only saw 60MB/s not the 120MB/s the (RAID1) is capable of to the destination (which can easily do 160MB/s sustained read/write). Did you copy it multi-threaded? I said *multiple-readers* show improved speed and you said I copied *one* file. Try copying two files in parallel. I'm doing in two xterms cat file1 /dev/null, cat file2 /dev/null and my raid1 shows ~110 MB/s, each drive doing about half. On file only does about 60 MB/s (this is over a PCI raid controller so the max 110 MB/s is a PCI bus limitation). Iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unexpectedly slow raid1 benchmark results.
On Sun, Mar 04, 2007 at 04:47:19AM -0800, Dan wrote: Just about the only stat in these tests that show a marked improvement between one and two drives is Random Seeks (which makes sense). What doesn't make sense is that none of the Sequential Input numbers increase. Shouldn't I be seeing close to a 100% improvement? This is not raid0! Since the drives are identical copies, you can't really optimize the sequential input since the input data is not stripped... OTOH, if you have two processes doing sequential input, then yes, each drive should work close or at the normal sequential input speed. Regards, Iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20-rc6] md: expose uuid and degraded attributes in sysfs
On Sat, Jan 27, 2007 at 02:59:48AM +0100, Iustin Pop wrote: From: Iustin Pop [EMAIL PROTECTED] This patch exposes the uuid and the degraded status of an assembled array through sysfs. [...] Sorry to ask, this was my first patch and I'm not sure what is the procedure to get it considered for merging... I was under the impression that just sending it to this list is enough. What do I have to do? Thanks, Iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20-rc6] md: expose uuid and degraded attributes in sysfs
On Sun, Feb 11, 2007 at 08:15:31AM +1100, Neil Brown wrote: Resending after a suitable pause (1-2 weeks) is never a bad idea. Ok, noted, thanks. Exposing the UUID isn't - and if it were it should be in md_default_attrs rather than md_redundancy_attrs. The UUID isn't an intrinsic aspect of the array. It is simply part of the metadata that is used to match up different devices from the same array. I see. Unfortunately, for now it's the only method of (more or less) persistently identifying the array. I plan to add support for the 'DDF' metadata format (an 'industry standard') and that will be managed entirely in user-space. The kernel won't know the uuid at all. I've briefly looked over the spec, but this seems a non-trivial change, away from current md superblocks to ddf... But the virtual disk GUIDs seem nice. In the meantime, probably the solution you gave below is best. So any solution for easy access to uuids should be done in user-space. Maybe mdadm could create a link /dev/md/by-uuid/ - /dev/whatever. ?? That sounds like a good idea. mdadm (or udev or another userspace solution) should work, given some safety measures against stale symlinks and such. It seems to me that, since now it's possible to assemble arrays without mdadm (by using sysfs), mdadm is not the best place to do it. Probably relying on udev is a better option, however it right now it seems that it gets only the block add events, and not the block remove ones. Thanks, Iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.20-rc6] md: expose uuid and degraded attributes in sysfs
From: Iustin Pop [EMAIL PROTECTED] This patch exposes the uuid and the degraded status of an assembled array through sysfs. The uuid is useful in the case when multiple arrays exist on a system and userspace needs to identify them; currently, the only portable way that I know of is using 'mdadm -D' on each device until the desired uuid is found. Having the uuid visible in sysfs is much more cleaner, IMHO. Note on the method to format the uuid: I'm not sure if this is the best way, I've copied and transformed the one in print_sb. The 'degraded' attribute is also useful to quickly determine if the array is degraded, instead of, again, parsing 'mdadm -D' output or relying on the other techniques (number of working devices against number of defined devices, etc.). The md code already keeps track of this attribute, so it's useful to export it. Signed-off-by: Iustin Pop [EMAIL PROTECTED] --- --- linux-2.6.20-rc6/drivers/md/md.c.orig 2007-01-27 02:31:11.496575360 +0100 +++ linux-2.6.20-rc6/drivers/md/md.c2007-01-27 02:32:51.746741201 +0100 @@ -2856,6 +2856,22 @@ static struct md_sysfs_entry md_suspend_hi = __ATTR(suspend_hi, S_IRUGO|S_IWUSR, suspend_hi_show, suspend_hi_store); +static ssize_t +uuid_show(mddev_t *mddev, char *page) +{ + __u32 *p = (__u32*)mddev-uuid; + return sprintf(page, %08x:%08x:%08x:%08x\n, p[0], p[1], p[2], p[3]); +} +static struct md_sysfs_entry md_uuid = +__ATTR_RO(uuid); + +static ssize_t +degraded_show(mddev_t *mddev, char *page) +{ + return sprintf(page, %i\n, mddev-degraded); +} +static struct md_sysfs_entry md_degraded = +__ATTR_RO(degraded); static struct attribute *md_default_attrs[] = { md_level.attr, @@ -2881,6 +2897,8 @@ md_suspend_lo.attr, md_suspend_hi.attr, md_bitmap.attr, + md_uuid.attr, + md_degraded.attr, NULL, }; static struct attribute_group md_redundancy_group = { - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html