Re: [Iscsitarget-devel] Abort Task ?
Ming Zhang wrote: as Ross pointed out, many io pattern only have 1 outstanding io at any time, so there is only one work thread actively to serve it. so it can not exploit the multiple core here. you see 100% at nullio or fileio? with disk, most time should spend on iowait and cpu utilization should not high at all. With both nullio and fileio... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Time to deprecate old RAID formats?
So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? It's certainly easy enough to change mdadm to default to the 1.2 format and to require a --force switch to allow use of the older formats. I keep seeing that we support these old formats, and it's never been clear to me why we have four different ones available? Why can't we start defining the canonical format for Linux RAID metadata? Thanks, John [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
Ming Zhang wrote: On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. usually istdx will not take 100% cpu with 1G network, especially when using disk as back storage, some kind of profiling work might be helpful to tell what happened... forgot to ask, your sparc64 platform cpu spec. Root gershwin:[/mnt/solaris] cat /proc/cpuinfo cpu : UltraSparc T1 (Niagara) fpu : UltraSparc T1 integrated FPU prom: OBP 4.23.4 2006/08/04 20:45 type: sun4v ncpus probed: 24 ncpus active: 24 D$ parity tl1 : 0 I$ parity tl1 : 0 Both servers are built with 1 GHz T1 processors (6 cores, 24 threads). Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, Doug Ledford wrote: On Fri, 2007-10-19 at 13:05 -0400, Justin Piszcz wrote: I'm sure an internal bitmap would. On RAID1 arrays, reads/writes are never split up by a chunk size for stripes. A 2mb read is a single read, where as on a raid4/5/6 array, a 2mb read will end up hitting a series of stripes across all disks. That means that on raid1 arrays, total disk seeks total reads/writes, where as on a raid4/5/6, total disk seeks is usually total reads/writes. That in turn implies that in a raid1 setup, disk seek time is important to performance, but not necessarily paramount. For raid456, disk seek time is paramount because of how many more seeks that format uses. When you then use an internal bitmap, you are adding writes to every member of the raid456 array, which adds more seeks. The same is true for raid1, but since raid1 doesn't have the same level of dependency on seek rates that raid456 has, it doesn't show the same performance hit that raid456 does. Got it, so for RAID1 it would make sense if LILO supported it (the later versions of the md superblock) Lilo doesn't know anything about the superblock format, however, lilo expects the raid1 device to start at the beginning of the physical partition. In otherwords, format 1.0 would work with lilo. Did not work when I tried 1.x with LILO, switched back to 00.90.03 and it worked fine. (for those who use LILO) but for RAID4/5/6, keep the bitmaps away :) I still use an internal bitmap regardless ;-) To help mitigate the cost of seeks on raid456, you can specify a huge chunk size (like 256k to 2MB or somewhere in that range). As long as you can get 90%+ of your reads/writes to fall into the space of a single chunk, then you start performing more like a raid1 device without the extra seek overhead. Of course, this comes at the expense of peak throughput on the device. Let's say you were building a mondo movie server, where you were streaming out digital movie files. In that case, you very well may care more about throughput than seek performance since I suspect you wouldn't have many small, random reads. Then I would use a small chunk size, sacrifice the seek performance, and get the throughput bonus of parallel reads from the same stripe on multiple disks. On the other hand, if I was setting up a mail server then I would go with a large chunk size because the filesystem activities themselves are going to produce lots of random seeks, and you don't want your raid setup to make that problem worse. Plus, most mail doesn't come in or go out at any sort of massive streaming speed, so you don't need the paralllel reads from multiple disks to perform well. It all depends on your particular use scenario. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, Oct 19, 2007 at 02:39:47PM -0400, John Stoffel wrote: And if putting the superblock at the end is problematic, why is it the default? Shouldn't version 1.1 be the default? In my opinion, having the superblock *only* at the end (e.g. the 0.90 format) is the best option. It allows one to mount the disk separately (in case of RAID 1), if the MD superblock is corrupt or you just want to get easily at the raw data. As to the people who complained exactly because of this feature, LVM has two mechanisms to protect from accessing PVs on the raw disks (the ignore raid components option and the filter - I always set filters when using LVM ontop of MD). regards, iustin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
On Fri, 2007-10-19 at 23:04 +0200, BERTRAND Joël wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 i would rather use oprofile to check where cpu cycles went to. Regards, JKB - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Iscsitarget-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery I am unsure why you would want to setup an iSCSI RAID1, but before doing so I would try to verify that each independant iSCSI session is bullet proof. I use one and only one iSCSI session. Raid1 array is built between a local and iSCSI volume. Oh, in that case you will be much better served with DRBD, which would provide you with what you want without creating a Frankenstein setup... -Ross __ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 What is the output of: cat /proc/5824/wchan cat /proc/5599/wchan Thanks, Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Given that the summary shows 87.4% idle, something is not right. You might try another tool, like vmstat, to at least verify the way the CPU is being used. When you can't trust what your tools tell you it gets really hard to make decisions based on the data. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap
On 10/19/07, Neil Brown [EMAIL PROTECTED] wrote: On Friday October 19, [EMAIL PROTECTED] wrote: I'm using a stock 2.6.19.7 that I then backported various MD fixes to from 2.6.20 - 2.6.23... this kernel has worked great until I attempted v1.0 sb w/ bitmap=internal using mdadm 2.6.x. But would you like me to try a stock 2.6.22 or 2.6.23 kernel? Yes please. I'm suspecting the code in write_sb_page where it tests if the bitmap overlaps the data or metadata. The only way I can see you getting the exact error that you do get it for that to fail. That test was introduced in 2.6.22. Did you backport that? Any chance it got mucked up a bit? I believe you're referring to commit f0d76d70bc77b9b11256a3a23e98e80878be1578. That change actually made it into 2.6.23 AFAIK; but yes I actually did backport that fix (which depended on ab6085c795a71b6a21afe7469d30a365338add7a). If I back-out f0d76d70bc77b9b11256a3a23e98e80878be1578 I can create a raid1 w/ v1.0 sb and an internal bitmap. But clearly that is just because I removed the negative checks that you introduced ;) For me this begs the question: what else would f0d76d70bc77b9b11256a3a23e98e80878be1578 depend on that I missed? I included 505fa2c4a2f125a70951926dfb22b9cf273994f1 and ab6085c795a71b6a21afe7469d30a365338add7a too. *shrug*... Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid1/5 over iSCSI trouble
Bill Davidsen wrote: BERTRAND Joël wrote: Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free,10136k buffers Swap: 7815536k total,0k used, 7815536k free,64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Given that the summary shows 87.4% idle, something is not right. You might try another tool, like vmstat, to at least verify the way the CPU is being used. When you can't trust what your tools tell you it gets really hard to make decisions based on the data. ALSO: you have zombie processes. Looking at machines up for 45, 54, and 470 days, zombies are *not* something you just have to expect. Do you get these just about the same time things go to hell? Better you than me, I suspect there are still many ways to have a learning experience with iSCSI. Hope that and the summary confusion result in some useful data. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 2007-10-19 at 23:23 +0200, Iustin Pop wrote: On Fri, Oct 19, 2007 at 02:39:47PM -0400, John Stoffel wrote: And if putting the superblock at the end is problematic, why is it the default? Shouldn't version 1.1 be the default? In my opinion, having the superblock *only* at the end (e.g. the 0.90 format) is the best option. It allows one to mount the disk separately (in case of RAID 1), if the MD superblock is corrupt or you just want to get easily at the raw data. Bad reasoning. It's the reason that the default is at the end of the device, but that was a bad decision made by Ingo long, long ago in a galaxy far, far away. The simple fact of the matter is there are only two type of raid devices for the purpose of this issue: those that fragment data (raid0/4/5/6/10) and those that don't (raid1, linear). For the purposes of this issue, there are only two states we care about: the raid array works or doesn't work. If the raid array works, then you *only* want the system to access the data via the raid array. If the raid array doesn't work, then for the fragmented case you *never* want the system to see any of the data from the raid array (such as an ext3 superblock) or a subsequent fsck could see a valid superblock and actually start a filesystem scan on the raw device, and end up hosing the filesystem beyond all repair after it hits the first chunk size break (although in practice this is usually a situation where fsck declares the filesystem so corrupt that it refuses to touch it, that's leaving an awful lot to chance, you really don't want fsck to *ever* see that superblock). If the raid array is raid1, then the raid array should *never* fail to start unless all disks are missing (in which case there is no raw device to access anyway). The very few failure types that will cause the raid array to not start automatically *and* still have an intact copy of the data usually happen when the raid array is perfectly healthy, in which case automatically finding a constituent device when the raid array failed to start is exactly the *wrong* thing to do (for instance, you enable SELinux on a machine and it hasn't been relabeled and the raid array fails to start because /dev/mdblah can't be created because of an SELinux denial...all the raid1 members are still there, but if you touch a single one of them, then you run the risk of creating silent data corruption). It really boils down to this: for any reason that a raid array might fail to start, you *never* want to touch the underlying data until someone has taken manual measures to figure out why it didn't start and corrected the problem. Putting the superblock in front of the data does not prevent manual measures (such as recreating superblocks) from getting at the data. But, putting superblocks at the end leaves the door open for accidental access via constituent devices when you *really* don't want that to happen. So, no, the default should *not* be at the end of the device. As to the people who complained exactly because of this feature, LVM has two mechanisms to protect from accessing PVs on the raw disks (the ignore raid components option and the filter - I always set filters when using LVM ontop of MD). regards, iustin -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
[snip] I am unsure why you would want to setup an iSCSI RAID1, but before doing so I would try to verify that each independant iSCSI session is bullet proof. I use one and only one iSCSI session. Raid1 array is built between a local and iSCSI volume. So you only get this problem doesn't happen when doing I/O with only the iSCSI session? Wouldn't it be better to do the RAID1 on the target machine? Then you don't need to mess around with weird timing behavior of remote/local writing. If you want to have the disks on 2 different machines and have them mirrored DRDB is the way to go. @Ross: He is trying mirroring his local drive with a iSCSI lun. JKB -- Scott Kaelin Sitrof Technologies [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble
BERTRAND Joël wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: Bill Davidsen wrote: Dan Williams wrote: On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Given that offload engines are far less tested code, I think this is a very good thing to try! I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU when I rebuild my raid1 array. 1% of this array was now resynchronized without any hang. Root gershwin:[/usr/scripts] cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md7 : active raid1 sdi1[2] md_d0p1[0] 1464725632 blocks [2/1] [U_] [] recovery = 1.0% (15705536/1464725632) finish=1103.9min speed=21875K/sec Same result... connection2:0: iscsi: detected conn error (1011) session2: iscsi: session recovery timed out after 120 secs sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery sd 4:0:0:0: scsi: Device offlined - not ready after error recovery Sorry for this last mail. I have found another mistake, but I don't know if this bug comes from iscsi-target or raid5 itself. iSCSI target is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of CPU each ! Tasks: 235 total, 6 running, 227 sleeping, 0 stopped, 2 zombie Cpu(s): 0.1%us, 12.5%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4139032k total, 218424k used, 3920608k free, 10136k buffers Swap: 7815536k total,0k used, 7815536k free, 64808k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 5824 root 15 -5 000 R 100 0.0 10:34.25 istd1 5599 root 15 -5 000 R 100 0.0 7:25.43 md_d0_raid5 Regards, JKB If you have 2 iSCSI sessions mirrored then any failure along either path will hose the setup. Plus having iSCSI and MD RAID fight over same resources in kernel is a recipe for a race condition. How about exploring MPIO and DRBD? -Ross __ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 2007-10-19 at 12:38 -0400, John Stoffel wrote: 1, 1.0, 1.1, 1.2 Use the new version-1 format superblock. This has few restrictions. The different sub-versions store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). It looks to me that the 1.1, combined with the 1.0 should be what we use, with the 1.2 format nuked. Maybe call it 1.3? *grin* You're somewhat misreading the man page. You *can't* combine 1.0 with 1.1. All of the above options: 1, 1.0, 1.1, 1.2; specifically mean to use a version 1 superblock. 1.0 means use a version 1 superblock at the end of the disk. 1.1 means version 1 superblock at beginning of disk. `1.2 means version 1 at 4k offset from beginning of the disk. There really is no actual version 1.1, or 1.2, the .0, .1, and .2 part of the version *only* means where to put the version 1 superblock on the disk. If you just say version 1, then it goes to the default location for version 1 superblocks, and last I checked that was the end of disk (aka, 1.0). -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, Doug Ledford wrote: On Fri, 2007-10-19 at 12:45 -0400, Justin Piszcz wrote: On Fri, 19 Oct 2007, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin Is a bitmap created by default with 1.x? I remember seeing Justin reports of 15-30% performance degradation using a bitmap on a Justin RAID5 with 1.x. Not according to the mdadm man page. I'd probably give up that performance if it meant that re-syncing an array went much faster after a crash. I certainly use it on my RAID1 setup on my home machine. John The performance AFTER a crash yes, but in general usage I remember seeing someone here doing benchmarks it had a negative affect on performance. I'm sure an internal bitmap would. On RAID1 arrays, reads/writes are never split up by a chunk size for stripes. A 2mb read is a single read, where as on a raid4/5/6 array, a 2mb read will end up hitting a series of stripes across all disks. That means that on raid1 arrays, total disk seeks total reads/writes, where as on a raid4/5/6, total disk seeks is usually total reads/writes. That in turn implies that in a raid1 setup, disk seek time is important to performance, but not necessarily paramount. For raid456, disk seek time is paramount because of how many more seeks that format uses. When you then use an internal bitmap, you are adding writes to every member of the raid456 array, which adds more seeks. The same is true for raid1, but since raid1 doesn't have the same level of dependency on seek rates that raid456 has, it doesn't show the same performance hit that raid456 does. Justin. -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband Got it, so for RAID1 it would make sense if LILO supported it (the later versions of the md superblock) (for those who use LILO) but for RAID4/5/6, keep the bitmaps away :) Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin Is a bitmap created by default with 1.x? I remember seeing Justin reports of 15-30% performance degradation using a bitmap on a Justin RAID5 with 1.x. Not according to the mdadm man page. I'd probably give up that performance if it meant that re-syncing an array went much faster after a crash. I certainly use it on my RAID1 setup on my home machine. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, John Stoffel wrote: Doug == Doug Ledford [EMAIL PROTECTED] writes: Doug On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? Doug 1.0, 1.1, and 1.2 are the same format, just in different positions on Doug the disk. Of the three, the 1.1 format is the safest to use since it Doug won't allow you to accidentally have some sort of metadata between the Doug beginning of the disk and the raid superblock (such as an lvm2 Doug superblock), and hence whenever the raid array isn't up, you won't be Doug able to accidentally mount the lvm2 volumes, filesystem, etc. (In worse Doug case situations, I've seen lvm2 find a superblock on one RAID1 array Doug member when the RAID1 array was down, the system came up, you used the Doug system, the two copies of the raid array were made drastically Doug inconsistent, then at the next reboot, the situation that prevented the Doug RAID1 from starting was resolved, and it never know it failed to start Doug last time, and the two inconsistent members we put back into a clean Doug array). So, deprecating any of these is not really helpful. And you Doug need to keep the old 0.90 format around for back compatibility with Doug thousands of existing raid arrays. This is a great case for making the 1.1 format be the default. So what are the advantages of the 1.0 and 1.2 formats then? Or should be we thinking about making two copies of the data on each RAID member, one at the beginning and one at the end, for resiliency? I just hate seeing this in the mag page: Declare the style of superblock (raid metadata) to be used. The default is 0.90 for --create, and to guess for other operations. The default can be overridden by setting the metadata value for the CREATE keyword in mdadm.conf. Options are: 0, 0.90, default Use the original 0.90 format superblock. This format limits arrays to 28 component devices and limits compo- nent devices of levels 1 and greater to 2 terabytes. 1, 1.0, 1.1, 1.2 Use the new version-1 format superblock. This has few restrictions. The different sub-versions store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). It looks to me that the 1.1, combined with the 1.0 should be what we use, with the 1.2 format nuked. Maybe call it 1.3? *grin* So at this point I'm not arguing to get rid of the 0.9 format, though I think it should NOT be the default any more, we should be using the 1.1 combined with 1.0 format. Is a bitmap created by default with 1.x? I remember seeing reports of 15-30% performance degradation using a bitmap on a RAID5 with 1.x. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, Doug Ledford wrote: On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? 1.0, 1.1, and 1.2 are the same format, just in different positions on the disk. Of the three, the 1.1 format is the safest to use since it won't allow you to accidentally have some sort of metadata between the beginning of the disk and the raid superblock (such as an lvm2 superblock), and hence whenever the raid array isn't up, you won't be able to accidentally mount the lvm2 volumes, filesystem, etc. (In worse case situations, I've seen lvm2 find a superblock on one RAID1 array member when the RAID1 array was down, the system came up, you used the system, the two copies of the raid array were made drastically inconsistent, then at the next reboot, the situation that prevented the RAID1 from starting was resolved, and it never know it failed to start last time, and the two inconsistent members we put back into a clean array). So, deprecating any of these is not really helpful. And you need to keep the old 0.90 format around for back compatibility with thousands of existing raid arrays. Agree, what is the benefit in deprecating them? Is there that much old code or? It's certainly easy enough to change mdadm to default to the 1.2 format and to require a --force switch to allow use of the older formats. I keep seeing that we support these old formats, and it's never been clear to me why we have four different ones available? Why can't we start defining the canonical format for Linux RAID metadata? Thanks, John [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of Justin anything else! Are you sure? I find that GRUB is much easier to use and setup than LILO these days. But hey, just dropping down to support 00.09.03 and 1.2 formats would be fine too. Let's just lessen the confusion if at all possible. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? It's certainly easy enough to change mdadm to default to the 1.2 format and to require a --force switch to allow use of the older formats. I keep seeing that we support these old formats, and it's never been clear to me why we have four different ones available? Why can't we start defining the canonical format for Linux RAID metadata? Thanks, John [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of Justin anything else! Are you sure? I find that GRUB is much easier to use and setup than LILO these days. But hey, just dropping down to support 00.09.03 and 1.2 formats would be fine too. Let's just lessen the confusion if at all possible. John I am sure, I submitted a bug report to the LILO developer, he acknowledged the bug but I don't know if it was fixed. I have not tried GRUB with a RAID1 setup yet. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? It's certainly easy enough to change mdadm to default to the 1.2 format and to require a --force switch to allow use of the older formats. I keep seeing that we support these old formats, and it's never been clear to me why we have four different ones available? Why can't we start defining the canonical format for Linux RAID metadata? Thanks, John [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of Justin anything else! Are you sure? I find that GRUB is much easier to use and setup than LILO these days. But hey, just dropping down to support 00.09.03 and 1.2 formats would be fine too. Let's just lessen the confusion if at all possible. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? It's certainly easy enough to change mdadm to default to the 1.2 format and to require a --force switch to allow use of the older formats. I keep seeing that we support these old formats, and it's never been clear to me why we have four different ones available? Why can't we start defining the canonical format for Linux RAID metadata? Thanks, John [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I hope 00.90.03 is not deprecated, LILO cannot boot off of anything else! Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software RAID when it works and when it doesn't
On Fri, 19 Oct 2007, Alberto Alonso wrote: On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: Mike Accetta [EMAIL PROTECTED] writes: What I would like to see is a timeout driven fallback mechanism. If one mirror does not return the requested data within a certain time (say 1 second) then the request should be duplicated on the other mirror. If the first mirror later unchokes then it remains in the raid, if it fails it gets removed. But (at least reads) should not have to wait for that process. Even better would be if some write delay could also be used. The still working mirror would get an increase in its serial (so on reboot you know one disk is newer). If the choking mirror unchokes then it can write back all the delayed data and also increase its serial to match. Otherwise it gets really failed. But you might have to use bitmaps for this or the cache size would limit its usefullnes. MfG Goswin I think a timeout on both: reads and writes is a must. Basically I believe that all problems that I've encountered issues using software raid would have been resolved by using a timeout within the md code. This will keep a server from crashing/hanging when the underlying driver doesn't properly handle hard drive problems. MD can be smarter than the dumb drivers. Just my thoughts though, as I've never got an answer as to whether or not md can implement its own timeouts. Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I have a question with re-mapping sectors, can software raid be as efficient or good at remapping bad sectors as an external raid controller for, e.g., raid 10 or raid5? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
On Fri, 2007-10-19 at 16:47 +0200, BERTRAND Joël wrote: Ming Zhang wrote: as Ross pointed out, many io pattern only have 1 outstanding io at any time, so there is only one work thread actively to serve it. so it can not exploit the multiple core here. you see 100% at nullio or fileio? with disk, most time should spend on iowait and cpu utilization should not high at all. With both nullio and fileio... it is weired. with file io, run some io load, then run vmstat 1 and post here. supposed to see some iowait instead of high sys cpu usage... -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. usually istdx will not take 100% cpu with 1G network, especially when using disk as back storage, some kind of profiling work might be helpful to tell what happened... forgot to ask, your sparc64 platform cpu spec. You want your avg ping time for 8192 byte payloads to be 300us or less. 1000/.268 = 3731 IOPS @ 8k = 30 MB/s If you use apps that do overlapping asynchronous IO you can see better numbers. Regards, JKB -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
Bill Davidsen wrote: Dan Williams wrote: I found a problem which may lead to the operations count dropping below zero. If ops_complete_biofill() gets preempted in between the following calls: raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack); raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending); ...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL causing the assertion. In fact, the 'pending' bit should always be cleared first, but the other cases are protected by spin_lock(sh-lock). Patch attached. Once this patch has been vetted, can it be offered to -stable for 2.6.23? Or to be pedantic, it *can*, will you make that happen? I never see any oops with this patch. But I cannot create a RAID1 array with a local RAID5 volume and a foreign RAID5 array exported by iSCSI. iSCSI seems to works fine, but RAID1 creation randomly aborts due to a unknown SCSI task on target side. I have stressed iSCSI target with some simultaneous I/O without any trouble (nullio, fileio and blockio), thus I suspect another bug in raid code (or an arch specific bug). The last two days, I have made some tests to isolate and reproduce this bug: 1/ iSCSI target and initiator seem work when I export with iSCSI a raid5 array; 2/ raid1 and raid5 seem work with local disks; 3/ iSCSI target is disconnected only when I create a raid1 volume over iSCSI (blockio _and_ fileio) with following message: Oct 18 10:43:52 poulenc kernel: iscsi_trgt: cmnd_abort(1156) 29 1 0 42 57344 0 0 Oct 18 10:43:52 poulenc kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:630024457682948 (Unknown Task) I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of mdadm version 2.6.4 It is available at the usual places: http://www.cse.unsw.edu.au/~neilb/source/mdadm/ and countrycode=xx. http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/ and via git at git://neil.brown.name/mdadm http://neil.brown.name/git?p=mdadm mdadm is a tool for creating, managing and monitoring device arrays using the md driver in Linux, also known as Software RAID arrays. Release 2.6.4 adds a few minor bug fixes to 2.6.3 Changelog Entries: - Make --create --auto=mdp work for non-standard device names. - Fix restarting of a 'reshape' if it was stopped in the middle. - Fix a segfault when using v1 superblock. - Make --write-mostly effective when re-adding a device to an array. - Various minor fixes Development of mdadm is sponsored by SUSE Labs, Novell Inc. NeilBrown 19th October 2007 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
Doug == Doug Ledford [EMAIL PROTECTED] writes: Doug On Fri, 2007-10-19 at 12:38 -0400, John Stoffel wrote: 1, 1.0, 1.1, 1.2 Use the new version-1 format superblock. This has few restrictions. The different sub-versions store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). It looks to me that the 1.1, combined with the 1.0 should be what we use, with the 1.2 format nuked. Maybe call it 1.3? *grin* Doug You're somewhat misreading the man page. The man page is somewhat misleading then. It's not clear from reading it that the version 1 RAID superblock can be in one of three different positions in the volume. Doug You *can't* combine 1.0 with 1.1. All of the above options: 1, Doug 1.0, 1.1, 1.2; specifically mean to use a version 1 superblock. Doug 1.0 means use a version 1 superblock at the end of the disk. Doug 1.1 means version 1 superblock at beginning of disk. `1.2 means Doug version 1 at 4k offset from beginning of the disk. There really Doug is no actual version 1.1, or 1.2, the .0, .1, and .2 part of the Doug version *only* means where to put the version 1 superblock on Doug the disk. If you just say version 1, then it goes to the Doug default location for version 1 superblocks, and last I checked Doug that was the end of disk (aka, 1.0). So why not get rid of (deprecate) the version 1.0 and version 1.2 blocks, and only support the 1.1 version? Why do we have three different positions for storing the superblock? And if putting the superblock at the end is problematic, why is it the default? Shouldn't version 1.1 be the default? Or, alternatively, update the code so that we support RAID superblocks at BOTH the beginning and end 4k of the disk, for maximum redundancy. I guess I need to go and read the code to figure out the placement of 0.90 and 1.0 blocks to see how they are different. It's just not clear to me why we have such a muddle of 1.x formats to choose from and what the advantages and tradeoffs are between them. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 19 Oct 2007, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin Is a bitmap created by default with 1.x? I remember seeing Justin reports of 15-30% performance degradation using a bitmap on a Justin RAID5 with 1.x. Not according to the mdadm man page. I'd probably give up that performance if it meant that re-syncing an array went much faster after a crash. I certainly use it on my RAID1 setup on my home machine. John The performance AFTER a crash yes, but in general usage I remember seeing someone here doing benchmarks it had a negative affect on performance. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
Doug == Doug Ledford [EMAIL PROTECTED] writes: Doug On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? Doug 1.0, 1.1, and 1.2 are the same format, just in different positions on Doug the disk. Of the three, the 1.1 format is the safest to use since it Doug won't allow you to accidentally have some sort of metadata between the Doug beginning of the disk and the raid superblock (such as an lvm2 Doug superblock), and hence whenever the raid array isn't up, you won't be Doug able to accidentally mount the lvm2 volumes, filesystem, etc. (In worse Doug case situations, I've seen lvm2 find a superblock on one RAID1 array Doug member when the RAID1 array was down, the system came up, you used the Doug system, the two copies of the raid array were made drastically Doug inconsistent, then at the next reboot, the situation that prevented the Doug RAID1 from starting was resolved, and it never know it failed to start Doug last time, and the two inconsistent members we put back into a clean Doug array). So, deprecating any of these is not really helpful. And you Doug need to keep the old 0.90 format around for back compatibility with Doug thousands of existing raid arrays. This is a great case for making the 1.1 format be the default. So what are the advantages of the 1.0 and 1.2 formats then? Or should be we thinking about making two copies of the data on each RAID member, one at the beginning and one at the end, for resiliency? I just hate seeing this in the mag page: Declare the style of superblock (raid metadata) to be used. The default is 0.90 for --create, and to guess for other operations. The default can be overridden by setting the metadata value for the CREATE keyword in mdadm.conf. Options are: 0, 0.90, default Use the original 0.90 format superblock. This format limits arrays to 28 component devices and limits compo- nent devices of levels 1 and greater to 2 terabytes. 1, 1.0, 1.1, 1.2 Use the new version-1 format superblock. This has few restrictions. The different sub-versions store the superblock at different locations on the device, either at the end (for 1.0), at the start (for 1.1) or 4K from the start (for 1.2). It looks to me that the 1.1, combined with the 1.0 should be what we use, with the 1.2 format nuked. Maybe call it 1.3? *grin* So at this point I'm not arguing to get rid of the 0.9 format, though I think it should NOT be the default any more, we should be using the 1.1 combined with 1.0 format. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Time to deprecate old RAID formats?
On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote: Justin == Justin Piszcz [EMAIL PROTECTED] writes: Justin On Fri, 19 Oct 2007, John Stoffel wrote: So, Is it time to start thinking about deprecating the old 0.9, 1.0 and 1.1 formats to just standardize on the 1.2 format? What are the issues surrounding this? 1.0, 1.1, and 1.2 are the same format, just in different positions on the disk. Of the three, the 1.1 format is the safest to use since it won't allow you to accidentally have some sort of metadata between the beginning of the disk and the raid superblock (such as an lvm2 superblock), and hence whenever the raid array isn't up, you won't be able to accidentally mount the lvm2 volumes, filesystem, etc. (In worse case situations, I've seen lvm2 find a superblock on one RAID1 array member when the RAID1 array was down, the system came up, you used the system, the two copies of the raid array were made drastically inconsistent, then at the next reboot, the situation that prevented the RAID1 from starting was resolved, and it never know it failed to start last time, and the two inconsistent members we put back into a clean array). So, deprecating any of these is not really helpful. And you need to keep the old 0.90 format around for back compatibility with thousands of existing raid arrays. It's certainly easy enough to change mdadm to default to the 1.2 format and to require a --force switch to allow use of the older formats. I keep seeing that we support these old formats, and it's never been clear to me why we have four different ones available? Why can't we start defining the canonical format for Linux RAID metadata? Thanks, John [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of Justin anything else! Are you sure? I find that GRUB is much easier to use and setup than LILO these days. But hey, just dropping down to support 00.09.03 and 1.2 formats would be fine too. Let's just lessen the confusion if at all possible. John - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: [Iscsitarget-devel] Abort Task ?
On Fri, 2007-10-19 at 16:30 +0200, BERTRAND Joël wrote: Ming Zhang wrote: On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. usually istdx will not take 100% cpu with 1G network, especially when using disk as back storage, some kind of profiling work might be helpful to tell what happened... forgot to ask, your sparc64 platform cpu spec. Root gershwin:[/mnt/solaris] cat /proc/cpuinfo cpu : UltraSparc T1 (Niagara) fpu : UltraSparc T1 integrated FPU prom: OBP 4.23.4 2006/08/04 20:45 type: sun4v ncpus probed: 24 ncpus active: 24 D$ parity tl1 : 0 I$ parity tl1 : 0 Both servers are built with 1 GHz T1 processors (6 cores, 24 threads). as Ross pointed out, many io pattern only have 1 outstanding io at any time, so there is only one work thread actively to serve it. so it can not exploit the multiple core here. you see 100% at nullio or fileio? with disk, most time should spend on iowait and cpu utilization should not high at all. Regards, JKB -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Iscsitarget-devel] Abort Task ?
Ming Zhang wrote: On Fri, 2007-10-19 at 16:30 +0200, BERTRAND Joël wrote: Ming Zhang wrote: On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote: Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. usually istdx will not take 100% cpu with 1G network, especially when using disk as back storage, some kind of profiling work might be helpful to tell what happened... forgot to ask, your sparc64 platform cpu spec. Root gershwin:[/mnt/solaris] cat /proc/cpuinfo cpu : UltraSparc T1 (Niagara) fpu : UltraSparc T1 integrated FPU prom: OBP 4.23.4 2006/08/04 20:45 type: sun4v ncpus probed: 24 ncpus active: 24 D$ parity tl1 : 0 I$ parity tl1 : 0 Both servers are built with 1 GHz T1 processors (6 cores, 24 threads). as Ross pointed out, many io pattern only have 1 outstanding io at any time, so there is only one work thread actively to serve it. so it can not exploit the multiple core here. you see 100% at nullio or fileio? with disk, most time should spend on iowait and cpu utilization should not high at all. Maybe it has to do with the endian-ness fix? Look at where the fix was implemented and if there was a simpler way of implementing it? (If that is the cause) The network is still slower then expected, I don't know what chipset the Sparcs use for their interfaces, if it is e1000 then you can set low-latency interrupt throttling with InterruptThrottleRate=1 which works well. You can explore other interface module options around Interrupt throttling or coalesence. -Ross __ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please
async_tx: get best channel
Hello Dan, I have a suggestion regarding the async_tx_find_channel() procedure. First, a little introduction. Some processors (e.g. ppc440spe) have several DMA engines (say DMA1 and DMA2) which are capable of performing the same type of operation, say XOR. The DMA2 engine may process the XOR operation faster than the DMA1 engine, but DMA2 (which is faster) has some restrictions for the source operand addresses, whereas there are no such restrictions for DMA1 (which is slower). So the question is, how may ASYNC_TX select the DMA engine which will be the most effective for the given tx operation ? In the example just described this means: if the faster engine, DMA2, may process the tx operation with the given source operand addresses, then we select DMA2; if the given source operand addresses cannot be processed with DMA2, then we select the slower engine, DMA1. I see the following way for introducing such functionality. We may introduce an additional method in struct dma_device (let's call it device_estimate()) which would take the following as the arguments: --- the list of sources to be processed during the given tx, --- the type of operation (XOR, COPY, ...), --- perhaps something else, and then estimate the effectiveness of processing this tx on the given channel. The async_tx_find_channel() function should call the device_estimate() method for each registered dma channel and then select the most effective one. The architecture specific ADMA driver will be responsible for returning the greatest value from the device_estimate() method for the channel which will be the most effective for this given tx. What are your thoughts regarding this? Do you see any other effective ways for enhancing ASYNC_TX with such functionality? Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Iscsitarget-devel] Abort Task ?
Ross S. W. Walker wrote: BERTRAND Joël wrote: BERTRAND Joël wrote: I can format serveral times (mkfs.ext3) a 1.5 TB volume over iSCSI without any trouble. I can read and write on this virtual disk without any trouble. Now, I have configured ietd with : Lun 0 Sectors=1464725758,Type=nullio and I run on initiator side : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 479482+0 records in 479482+0 records out 3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 I'm waitinfor a crash. No one when I write these lines. I suspect an interaction between raid and iscsi. I simultanely run : Root gershwin:[/dev] dd if=/dev/zero of=/dev/sdj bs=8192 8397210+0 records in 8397210+0 records out 68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s and Root gershwin:[~] dd if=/dev/sdj of=/dev/null bs=8192 739200+0 records in 739199+0 records out 6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s without any trouble. The speed can definitely be improved. Look at your network setup and use ping to try and get the network latency to a minimum. # ping -A -s 8192 172.16.24.140 --- 172.16.24.140 ping statistics --- 14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms gershwin:[~] ping -A -s 8192 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data. 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms --- 192.168.0.2 ping statistics --- 13 packets transmitted, 13 received, 0% packet loss, time 2400ms rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms gershwin:[~] Both initiator and target are alone on a gigabit NIC (Tigon3). On target server, istd1 takes 100% of a CPU (and only one CPU, even my T1000 can simultaneous run 32 threads). I think the limitation comes from istd1. You want your avg ping time for 8192 byte payloads to be 300us or less. 1000/.268 = 3731 IOPS @ 8k = 30 MB/s If you use apps that do overlapping asynchronous IO you can see better numbers. Regards, JKB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Software RAID when it works and when it doesn't
On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote: Mike Accetta [EMAIL PROTECTED] writes: What I would like to see is a timeout driven fallback mechanism. If one mirror does not return the requested data within a certain time (say 1 second) then the request should be duplicated on the other mirror. If the first mirror later unchokes then it remains in the raid, if it fails it gets removed. But (at least reads) should not have to wait for that process. Even better would be if some write delay could also be used. The still working mirror would get an increase in its serial (so on reboot you know one disk is newer). If the choking mirror unchokes then it can write back all the delayed data and also increase its serial to match. Otherwise it gets really failed. But you might have to use bitmaps for this or the cache size would limit its usefullnes. MfG Goswin I think a timeout on both: reads and writes is a must. Basically I believe that all problems that I've encountered issues using software raid would have been resolved by using a timeout within the md code. This will keep a server from crashing/hanging when the underlying driver doesn't properly handle hard drive problems. MD can be smarter than the dumb drivers. Just my thoughts though, as I've never got an answer as to whether or not md can implement its own timeouts. Alberto - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Raid5 trouble
On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote: I never see any oops with this patch. But I cannot create a RAID1 array with a local RAID5 volume and a foreign RAID5 array exported by iSCSI. iSCSI seems to works fine, but RAID1 creation randomly aborts due to a unknown SCSI task on target side. For now I am going to forward this patch to Neil for inclusion in -stable and 2.6.24-rc. I will add a Tested-by: Joël Bertrand [EMAIL PROTECTED] unless you have an objection. I have stressed iSCSI target with some simultaneous I/O without any trouble (nullio, fileio and blockio), thus I suspect another bug in raid code (or an arch specific bug). The last two days, I have made some tests to isolate and reproduce this bug: 1/ iSCSI target and initiator seem work when I export with iSCSI a raid5 array; 2/ raid1 and raid5 seem work with local disks; 3/ iSCSI target is disconnected only when I create a raid1 volume over iSCSI (blockio _and_ fileio) with following message: Oct 18 10:43:52 poulenc kernel: iscsi_trgt: cmnd_abort(1156) 29 1 0 42 57344 0 0 Oct 18 10:43:52 poulenc kernel: iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:630024457682948 (Unknown Task) I run for 12 hours some dd's (read and write in nullio) between initiator and target without any disconnection. Thus iSCSI code seems to be robust. Both initiator and target are alone on a single gigabit ethernet link (without any switch). I'm investigating... Can you reproduce on 2.6.22? Also, I do not think this is the cause of your failure, but you have CONFIG_DMA_ENGINE=y in your config. Setting this to 'n' will compile out the unneeded checks for offload engines in async_memcpy and async_xor. Regards, JKB Regards, Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
chunk size (was Re: Time to deprecate old RAID formats?)
Doug Ledford wrote: course, this comes at the expense of peak throughput on the device. Let's say you were building a mondo movie server, where you were streaming out digital movie files. In that case, you very well may care more about throughput than seek performance since I suspect you wouldn't have many small, random reads. Then I would use a small chunk size, sacrifice the seek performance, and get the throughput bonus of parallel reads from the same stripe on multiple disks. On the other hand, if I Out of curiosity though - why wouldn't large chunk work well here ? If you stream video (I assume large files, so like a good few MBs at least), the reads are parallel either way. Yes, the amount of data read from each of the disks will be in less perfect proportion than in small chunk size scenario, but it's pretty neglible. Benchamrks I've seen (like Justin's one) seem not to care much about chunk size in sequential read/write scenarios (and often favors larger chunks). Some of my own tests I did few months ago confirmed that as well. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html