Re: mdadm --stop goes off and never comes back?
On 12/19/07, Jon Nelson <[EMAIL PROTECTED]> wrote: > On 12/19/07, Neil Brown <[EMAIL PROTECTED]> wrote: > > On Tuesday December 18, [EMAIL PROTECTED] wrote: > > > This just happened to me. > > > Create raid with: > > > > > > mdadm --create /dev/md2 --level=raid10 --raid-devices=3 > > > --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3 > > > > > > cat /proc/mdstat > > > > > > md2 : active raid10 sdd3[2] sdc3[1] sdb3[0] > > > 5855424 blocks 64K chunks 2 offset-copies [3/3] [UUU] > > > [==>..] resync = 14.6% (859968/5855424) > > > finish=1.3min speed=61426K/sec > > > > > > Some log messages: > > > > > > Dec 18 15:02:28 turnip kernel: md: md2: raid array is not clean -- > > > starting background reconstruction > > > Dec 18 15:02:28 turnip kernel: raid10: raid set md2 active with 3 out > > > of 3 devices > > > Dec 18 15:02:28 turnip kernel: md: resync of RAID array md2 > > > Dec 18 15:02:28 turnip kernel: md: minimum _guaranteed_ speed: 1000 > > > KB/sec/disk. > > > Dec 18 15:02:28 turnip kernel: md: using maximum available idle IO > > > bandwidth (but not more than 20 KB/sec) for resync. > > > Dec 18 15:02:28 turnip kernel: md: using 128k window, over a total of > > > 5855424 blocks. > > > Dec 18 15:03:36 turnip kernel: md: md2: resync done. > > > Dec 18 15:03:36 turnip kernel: md: checkpointing resync of md2. > > > > > > I tried to stop the array: > > > > > > mdadm --stop /dev/md2 > > > > > > and mdadm never came back. It's off in the kernel somewhere. :-( > > > > > > kill, of course, has no effect. > > > The machine still runs fine, the rest of the raids (md0 and md1) work > > > fine (same disks). > > > > > > The output (snipped, only mdadm) of 'echo t > /proc/sysrq-trigger' > > > > > > Dec 18 15:09:13 turnip kernel: mdadm S 0001e5359fa38fb0 0 > > > 3943 1 (NOTLB) > > > Dec 18 15:09:13 turnip kernel: 810033e7ddc8 0086 > > > 0092 > > > Dec 18 15:09:13 turnip kernel: 0fc7 810033e7dd78 > > > 80617800 80617800 > > > Dec 18 15:09:13 turnip kernel: 8061d210 80617800 > > > 80617800 > > > Dec 18 15:09:13 turnip kernel: Call Trace: > > > Dec 18 15:09:13 turnip kernel: [] > > > __mutex_lock_interruptible_slowpath+0x8b/0xca > > > Dec 18 15:09:13 turnip kernel: [] do_open+0x222/0x2a5 > > > Dec 18 15:09:13 turnip kernel: [] > > > md_seq_show+0x127/0x6c1 > > > Dec 18 15:09:13 turnip kernel: [] vma_merge+0x141/0x1ee > > > Dec 18 15:09:13 turnip kernel: [] seq_read+0x1bf/0x28b > > > Dec 18 15:09:13 turnip kernel: [] vfs_read+0xcb/0x153 > > > Dec 18 15:09:13 turnip kernel: [] sys_read+0x45/0x6e > > > Dec 18 15:09:13 turnip kernel: [] system_call+0x7e/0x83 > > > > > > > > > > > > What happened? Is there any debug info I can provide before I reboot? > > > > Don't know very odd. > > > > The rest of the 'sysrq' output would possibly help. > > Does this help? It's the same syscall and args, I think, as above. > > Dec 18 15:09:13 turnip kernel: hald S 0001e52f4793e397 0 > 3040 1 (NOTLB) > Dec 18 15:09:13 turnip kernel: 81003aa51e38 0086 > 802 > 68ee6 > Dec 18 15:09:13 turnip kernel: 81002a97e5c0 81003aa51de8 > 80617800 806 > 17800 > Dec 18 15:09:13 turnip kernel: 8061d210 80617800 > 80617800 810 > 0bb48 > Dec 18 15:09:13 turnip kernel: Call Trace: > Dec 18 15:09:13 turnip kernel: [] > get_page_from_freelist+0x3c4/0x545 > Dec 18 15:09:13 turnip kernel: [] > __mutex_lock_interruptible_slowpath+0x8b/ > 0xca > Dec 18 15:09:13 turnip kernel: [] md_attr_show+0x2f/0x64 > Dec 18 15:09:13 turnip kernel: [] > sysfs_read_file+0xb3/0x111 > Dec 18 15:09:13 turnip kernel: [] vfs_read+0xcb/0x153 > Dec 18 15:09:13 turnip kernel: [] sys_read+0x45/0x6e > Dec 18 15:09:13 turnip kernel: [] system_call+0x7e/0x83 > Dec 18 15:09:13 turnip kernel: NOTE: kernel is stock openSUSE 10.3 kernel, x86_64, 2.6.22.13-0.3-default. -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm --stop goes off and never comes back?
On 12/19/07, Neil Brown <[EMAIL PROTECTED]> wrote: > On Tuesday December 18, [EMAIL PROTECTED] wrote: > > This just happened to me. > > Create raid with: > > > > mdadm --create /dev/md2 --level=raid10 --raid-devices=3 > > --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3 > > > > cat /proc/mdstat > > > > md2 : active raid10 sdd3[2] sdc3[1] sdb3[0] > > 5855424 blocks 64K chunks 2 offset-copies [3/3] [UUU] > > [==>..] resync = 14.6% (859968/5855424) > > finish=1.3min speed=61426K/sec > > > > Some log messages: > > > > Dec 18 15:02:28 turnip kernel: md: md2: raid array is not clean -- > > starting background reconstruction > > Dec 18 15:02:28 turnip kernel: raid10: raid set md2 active with 3 out > > of 3 devices > > Dec 18 15:02:28 turnip kernel: md: resync of RAID array md2 > > Dec 18 15:02:28 turnip kernel: md: minimum _guaranteed_ speed: 1000 > > KB/sec/disk. > > Dec 18 15:02:28 turnip kernel: md: using maximum available idle IO > > bandwidth (but not more than 20 KB/sec) for resync. > > Dec 18 15:02:28 turnip kernel: md: using 128k window, over a total of > > 5855424 blocks. > > Dec 18 15:03:36 turnip kernel: md: md2: resync done. > > Dec 18 15:03:36 turnip kernel: md: checkpointing resync of md2. > > > > I tried to stop the array: > > > > mdadm --stop /dev/md2 > > > > and mdadm never came back. It's off in the kernel somewhere. :-( > > > > kill, of course, has no effect. > > The machine still runs fine, the rest of the raids (md0 and md1) work > > fine (same disks). > > > > The output (snipped, only mdadm) of 'echo t > /proc/sysrq-trigger' > > > > Dec 18 15:09:13 turnip kernel: mdadm S 0001e5359fa38fb0 0 > > 3943 1 (NOTLB) > > Dec 18 15:09:13 turnip kernel: 810033e7ddc8 0086 > > 0092 > > Dec 18 15:09:13 turnip kernel: 0fc7 810033e7dd78 > > 80617800 80617800 > > Dec 18 15:09:13 turnip kernel: 8061d210 80617800 > > 80617800 > > Dec 18 15:09:13 turnip kernel: Call Trace: > > Dec 18 15:09:13 turnip kernel: [] > > __mutex_lock_interruptible_slowpath+0x8b/0xca > > Dec 18 15:09:13 turnip kernel: [] do_open+0x222/0x2a5 > > Dec 18 15:09:13 turnip kernel: [] md_seq_show+0x127/0x6c1 > > Dec 18 15:09:13 turnip kernel: [] vma_merge+0x141/0x1ee > > Dec 18 15:09:13 turnip kernel: [] seq_read+0x1bf/0x28b > > Dec 18 15:09:13 turnip kernel: [] vfs_read+0xcb/0x153 > > Dec 18 15:09:13 turnip kernel: [] sys_read+0x45/0x6e > > Dec 18 15:09:13 turnip kernel: [] system_call+0x7e/0x83 > > > > > > > > What happened? Is there any debug info I can provide before I reboot? > > Don't know very odd. > > The rest of the 'sysrq' output would possibly help. Does this help? It's the same syscall and args, I think, as above. Dec 18 15:09:13 turnip kernel: hald S 0001e52f4793e397 0 3040 1 (NOTLB) Dec 18 15:09:13 turnip kernel: 81003aa51e38 0086 802 68ee6 Dec 18 15:09:13 turnip kernel: 81002a97e5c0 81003aa51de8 80617800 806 17800 Dec 18 15:09:13 turnip kernel: 8061d210 80617800 80617800 810 0bb48 Dec 18 15:09:13 turnip kernel: Call Trace: Dec 18 15:09:13 turnip kernel: [] get_page_from_freelist+0x3c4/0x545 Dec 18 15:09:13 turnip kernel: [] __mutex_lock_interruptible_slowpath+0x8b/ 0xca Dec 18 15:09:13 turnip kernel: [] md_attr_show+0x2f/0x64 Dec 18 15:09:13 turnip kernel: [] sysfs_read_file+0xb3/0x111 Dec 18 15:09:13 turnip kernel: [] vfs_read+0xcb/0x153 Dec 18 15:09:13 turnip kernel: [] sys_read+0x45/0x6e Dec 18 15:09:13 turnip kernel: [] system_call+0x7e/0x83 Dec 18 15:09:13 turnip kernel: -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm --stop goes off and never comes back?
On Tuesday December 18, [EMAIL PROTECTED] wrote: > This just happened to me. > Create raid with: > > mdadm --create /dev/md2 --level=raid10 --raid-devices=3 > --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3 > > cat /proc/mdstat > > md2 : active raid10 sdd3[2] sdc3[1] sdb3[0] > 5855424 blocks 64K chunks 2 offset-copies [3/3] [UUU] > [==>..] resync = 14.6% (859968/5855424) > finish=1.3min speed=61426K/sec > > Some log messages: > > Dec 18 15:02:28 turnip kernel: md: md2: raid array is not clean -- > starting background reconstruction > Dec 18 15:02:28 turnip kernel: raid10: raid set md2 active with 3 out > of 3 devices > Dec 18 15:02:28 turnip kernel: md: resync of RAID array md2 > Dec 18 15:02:28 turnip kernel: md: minimum _guaranteed_ speed: 1000 > KB/sec/disk. > Dec 18 15:02:28 turnip kernel: md: using maximum available idle IO > bandwidth (but not more than 20 KB/sec) for resync. > Dec 18 15:02:28 turnip kernel: md: using 128k window, over a total of > 5855424 blocks. > Dec 18 15:03:36 turnip kernel: md: md2: resync done. > Dec 18 15:03:36 turnip kernel: md: checkpointing resync of md2. > > I tried to stop the array: > > mdadm --stop /dev/md2 > > and mdadm never came back. It's off in the kernel somewhere. :-( > > kill, of course, has no effect. > The machine still runs fine, the rest of the raids (md0 and md1) work > fine (same disks). > > The output (snipped, only mdadm) of 'echo t > /proc/sysrq-trigger' > > Dec 18 15:09:13 turnip kernel: mdadm S 0001e5359fa38fb0 0 > 3943 1 (NOTLB) > Dec 18 15:09:13 turnip kernel: 810033e7ddc8 0086 > 0092 > Dec 18 15:09:13 turnip kernel: 0fc7 810033e7dd78 > 80617800 80617800 > Dec 18 15:09:13 turnip kernel: 8061d210 80617800 > 80617800 > Dec 18 15:09:13 turnip kernel: Call Trace: > Dec 18 15:09:13 turnip kernel: [] > __mutex_lock_interruptible_slowpath+0x8b/0xca > Dec 18 15:09:13 turnip kernel: [] do_open+0x222/0x2a5 > Dec 18 15:09:13 turnip kernel: [] md_seq_show+0x127/0x6c1 > Dec 18 15:09:13 turnip kernel: [] vma_merge+0x141/0x1ee > Dec 18 15:09:13 turnip kernel: [] seq_read+0x1bf/0x28b > Dec 18 15:09:13 turnip kernel: [] vfs_read+0xcb/0x153 > Dec 18 15:09:13 turnip kernel: [] sys_read+0x45/0x6e > Dec 18 15:09:13 turnip kernel: [] system_call+0x7e/0x83 > > > > What happened? Is there any debug info I can provide before I reboot? Don't know very odd. The rest of the 'sysrq' output would possibly help. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On 12/19/07, Michal Soltys <[EMAIL PROTECTED]> wrote: > Justin Piszcz wrote: > > > > Or is there a better way to do this, does parted handle this situation > > better? > > > > What is the best (and correct) way to calculate stripe-alignment on the > > RAID5 device itself? > > > > > > Does this also apply to Linux/SW RAID5? Or are there any caveats that > > are not taken into account since it is based in SW vs. HW? > > > > --- > > In case of SW or HW raid, when you place raid aware filesystem directly on > it, I don't see any potential poblems > > Also, if md's superblock version/placement actually mattered, it'd be pretty > strange. The space available for actual use - be it partitions or filesystem > directly - should be always nicely aligned. I don't know that for sure though. > > If you use SW partitionable raid, or HW raid with partitions, then you would > have to align it on a chunk boundary manually. Any selfrespecting os > shouldn't complain a partition doesn't start on cylinder boundary these > days. LVM can complicate life a bit too - if you want it's volumes to be > chunk-aligned. That, for me, is the next question - how can one educate LVM about the underlying block device such that logical volumes carved out of that space align properly - many of us have experienced 30% (or so) performance losses for the convenience of LVM (and mighty convenient it is). -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
Justin Piszcz wrote: Or is there a better way to do this, does parted handle this situation better? What is the best (and correct) way to calculate stripe-alignment on the RAID5 device itself? Does this also apply to Linux/SW RAID5? Or are there any caveats that are not taken into account since it is based in SW vs. HW? --- In case of SW or HW raid, when you place raid aware filesystem directly on it, I don't see any potential poblems Also, if md's superblock version/placement actually mattered, it'd be pretty strange. The space available for actual use - be it partitions or filesystem directly - should be always nicely aligned. I don't know that for sure though. If you use SW partitionable raid, or HW raid with partitions, then you would have to align it on a chunk boundary manually. Any selfrespecting os shouldn't complain a partition doesn't start on cylinder boundary these days. LVM can complicate life a bit too - if you want it's volumes to be chunk-aligned. With NTFS the problem is, that it's not aware of underlaying raid in any way. It starts with 16 sectors long boot sector, somewhat compatible with ancient FAT. My blind guess would be to try to align the very first sector of $Mft with your chunk. Also, mentioned bootsector is also referenced as $Boot, thus I don't know if large cluster won't automatically extend it to full cluster size. Experiment, YMMV :) - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Robin Hill wrote: On Wed Dec 19, 2007 at 09:50:16AM -0500, Justin Piszcz wrote: The (up to) 30% percent figure is mentioned here: http://insights.oetiker.ch/linux/raidoptimization.html That looks to be referring to partitioning a RAID device - this'll only apply to hardware RAID or partitionable software RAID, not to the normal use case. When you're creating an array out of standard partitions then you know the array stripe size will align with the disks (there's no way it cannot), and you can set the filesystem stripe size to align as well (XFS will do this automatically). I've actually done tests on this with hardware RAID to try to find the correct partition offset, but wasn't able to see any difference (using bonnie++ and moving the partition start by one sector at a time). # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect This looks to be a normal disk - the partition offsets shouldn't be relevant here (barring any knowledge of the actual physical disk layout anyway, and block remapping may well make that rather irrelevant). That's my take on this one anyway. Cheers, Robin -- ___ ( ' } | Robin Hill<[EMAIL PROTECTED]> | / / ) | Little Jim says | // !! | "He fallen in de water !!" | Interesting, yes, I am using XFS as well, thanks for the response. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed Dec 19, 2007 at 09:50:16AM -0500, Justin Piszcz wrote: > The (up to) 30% percent figure is mentioned here: > http://insights.oetiker.ch/linux/raidoptimization.html > That looks to be referring to partitioning a RAID device - this'll only apply to hardware RAID or partitionable software RAID, not to the normal use case. When you're creating an array out of standard partitions then you know the array stripe size will align with the disks (there's no way it cannot), and you can set the filesystem stripe size to align as well (XFS will do this automatically). I've actually done tests on this with hardware RAID to try to find the correct partition offset, but wasn't able to see any difference (using bonnie++ and moving the partition start by one sector at a time). > # fdisk -l /dev/sdc > > Disk /dev/sdc: 150.0 GB, 150039945216 bytes > 255 heads, 63 sectors/track, 18241 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > Disk identifier: 0x5667c24a > >Device Boot Start End Blocks Id System > /dev/sdc1 1 18241 146520801 fd Linux raid > autodetect > This looks to be a normal disk - the partition offsets shouldn't be relevant here (barring any knowledge of the actual physical disk layout anyway, and block remapping may well make that rather irrelevant). That's my take on this one anyway. Cheers, Robin -- ___ ( ' } | Robin Hill<[EMAIL PROTECTED]> | / / ) | Little Jim says | // !! | "He fallen in de water !!" | pgpF38P14XDRA.pgp Description: PGP signature
Re: help diagnosing bad disk
On Wed, Dec 19, 2007 at 01:18:21PM -0500, Jon Sabo wrote: > So I was trying to copy over some Indiana Jones wav files and it > wasn't going my way. I noticed that my software raid device showed: > > /dev/md1 on / type ext3 (rw,errors=remount-ro) > > Is this saying that it was remounted, read only because it found a > problem with the md1 meta device? That's what it looks like it's > saying but I can still write to /. FYI, it means that it is currently "rw", and if there are errors, it will remount the filesystem readonly (as opposed to panic-ing). regards, iustin - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Bill Davidsen wrote: I'm going to try another approach, I'll describe it when I get results (or not). http://home.comcast.net/~jpiszcz/align_vs_noalign/ Hardly any difference at whatsoever, only on the per char for read/write is it any faster..? Average of 3 runs taken: $ cat align/*log|grep , p63,8G,57683,94,86479,13,55242,8,63495,98,147647,11,434.8,0,16:10:16/64,1334210,10,330,2,120,1,3978,10,312,2 p63,8G,57973,95,76702,11,50830,7,62291,99,136477,10,388.3,0,16:10:16/64,1252548,6,296,1,115,1,7927,20,373,2 p63,8G,57758,95,80847,12,52144,8,63874,98,144747,11,443.4,0,16:10:16/64,1242445,6,303,1,117,1,6767,17,359,2 $ cat noalign/*log|grep , p63,8G,57641,94,85494,12,55669,8,63802,98,146925,11,434.8,0,16:10:16/64,1353180,8,314,1,117,1,8684,22,283,2 p63,8G,57705,94,85929,12,56708,8,63855,99,143437,11,436.2,0,16:10:16/64,12211519,29,297,1,113,1,3218,8,325,2 p63,8G,57783,94,78226,11,48580,7,63487,98,137721,10,438.7,0,16:10:16/64,1243229,8,307,1,120,1,4247,11,313,2 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 reshape/resync - BUGREPORT/PROBLEM
- Message from [EMAIL PROTECTED] - - Message from [EMAIL PROTECTED] - Nagilum said: (by the date of Tue, 18 Dec 2007 11:09:38 +0100) >> Ok, I've recreated the problem in form of a semiautomatic testcase. >> All necessary files (plus the old xfs_repair output) are at: >> >> http://www.nagilum.de/md/ > >> After running the test.sh the created xfs filesystem on the raid >> device is broken and (at last in my case) cannot be mounted anymore. > > I think that you should file a bugreport - End message from [EMAIL PROTECTED] - Where would I file this bug report? I thought this is the place? I could also really use a way to fix that corruption. :( ouch. To be honest I subscribed here just a month ago, so I'm not sure. But I haven't seen other bugreports here so far. I was expecting that there is some bugzilla? Not really I'm afraid. At least not aware of anything like that for vanilla. Anyway I just verified the bug on 2.6.23.11 and 2.6.24-rc5-git4. Also I came across the bug on amd64 while I'm now using a PPC750 machine to verify the bug. So it's an architecture undependant bug. (but that was to be expected) I also prepared a different version of the testcase "v2_start.sh" and "v2_test.sh". This will print out all the wrong bytes (longs to be exact) + location. It shows the data is there, but scattered. :( Kind regards, Alex. - End message from [EMAIL PROTECTED] - #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgptVVVnLvuof.pgp Description: PGP Digital Signature
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Bill Davidsen wrote: Justin Piszcz wrote: On Wed, 19 Dec 2007, Bill Davidsen wrote: Justin Piszcz wrote: On Wed, 19 Dec 2007, Mattias Wadenstein wrote: On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein Is there any downside to doing that? I remember when I had to take my machine apart for a BIOS downgrade when I plugged in the sata devices again I did not plug them back in the same order, everything worked of course but when I ran LILO it said it was not part of the RAID set, because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, if I had not used partitions here, I'd have lost (or more of the drives) due to a bad LILO run? As other posts have detailed, putting the partition on a 64k aligned boundary can address the performance problems. However, a poor choice of chunk size, cache_buffer size, or just random i/o in small sizes can eat up a lot of the benefit. I don't think you need to give up your partitions to get the benefit of alignment. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark Hrmm.. I am doing a benchmark now with: 6 x 400GB (SATA) / 256 KiB stripe with unaligned vs. aligned raid setup. unligned, just fdisk /dev/sdc, mkpartition, fd raid. aligned, fdisk, expert, start at 512 as the off-set Per a Microsoft KB: Example of alignment calculations in kilobytes for a 256-KB stripe unit size: (63 * .5) / 256 = 0.123046875 (64 * .5) / 256 = 0.125 (128 * .5) / 256 = 0.25 (256 * .5) / 256 = 0.5 (512 * .5) / 256 = 1 These examples shows that the partition is not aligned correctly for a 256-KB stripe unit size until the partition is created by using an offset of 512 sectors (512 bytes per sector). So I should start at 512 for a 256k chunk size. I ran bonnie++ three consecutive times and took the average for the unaligned, rebuilding the RAID5 now and then I will re-execute the test 3 additional times and take the average of that. I'm going to try another approach, I'll describe it when I get results (or not). Waiting for the raid to rebuild then I will re-run thereafter. [=>...] recovery = 86.7% (339104640/390708480) finish=30.8min speed=27835K/sec ... - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help diagnosing bad disk
On Wed, 19 Dec 2007, Jon Sabo wrote: I found the problem. The power was unplugged from the drive. The sata power connectors aren't very good at securing the connector. I reattached the power connector to the sata drive and booted up. This is what it looks like now: [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:14 2007 Raid Level : raid1 Array Size : 1951744 (1906.32 MiB 1998.59 MB) Device Size : 1951744 (1906.32 MiB 1998.59 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:48:12 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 157f716c:0e7aebca:c20741f6:bb6099c9 Events : 0.44 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 001 removed [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:47 2007 Raid Level : raid1 Array Size : 974808064 (929.65 GiB 998.20 GB) Device Size : 974808064 (929.65 GiB 998.20 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:50:02 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 Events : 0.1498340 Number Major Minor RaidDevice State 0 000 removed 1 8 181 active sync /dev/sdb2 How do I put it back into the correct state? Thanks! mdadm /dev/md0 -a /dev/sdb1 mdadm /dev/md1 -a /dev/sda1 Weird that they got out out of sync on different drives. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help diagnosing bad disk
I think I got it now. Thanks for your help! [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:14 2007 Raid Level : raid1 Array Size : 1951744 (1906.32 MiB 1998.59 MB) Device Size : 1951744 (1906.32 MiB 1998.59 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 19 14:15:31 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 157f716c:0e7aebca:c20741f6:bb6099c9 Events : 0.48 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 001 removed [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:47 2007 Raid Level : raid1 Array Size : 974808064 (929.65 GiB 998.20 GB) Device Size : 974808064 (929.65 GiB 998.20 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Dec 19 14:19:06 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 Events : 0.1498998 Number Major Minor RaidDevice State 0 000 removed 1 8 181 active sync /dev/sdb2 [EMAIL PROTECTED]:/home/illsci# mdadm /dev/md0 -a /dev/sdb1 mdadm: re-added /dev/sdb1 [EMAIL PROTECTED]:/home/illsci# mdadm /dev/md1 -a /dev/sda2 mdadm: re-added /dev/sda2 [EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat Personalities : [multipath] [raid1] md1 : active raid1 sda2[2] sdb2[1] 974808064 blocks [2/1] [_U] resync=DELAYED md0 : active raid1 sdb1[2] sda1[0] 1951744 blocks [2/1] [U_] [=>...] recovery = 86.6% (1693504/1951744) finish=0.0min speed=80643K/sec unused devices: [EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat Personalities : [multipath] [raid1] md1 : active raid1 sda2[2] sdb2[1] 974808064 blocks [2/1] [_U] [>] recovery = 0.0% (86848/974808064) finish=186.9min speed=86848K/sec md0 : active raid1 sdb1[1] sda1[0] 1951744 blocks [2/2] [UU] unused devices: On Dec 19, 2007 2:09 PM, Jon Sabo <[EMAIL PROTECTED]> wrote: > We'll here's the rest of the info I should have sent in the last email: > > [EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat > Personalities : [multipath] [raid1] > md1 : active raid1 sdb2[1] > 974808064 blocks [2/1] [_U] > > md0 : active raid1 sda1[0] > 1951744 blocks [2/1] [U_] > > unused devices: > [EMAIL PROTECTED]:/home/illsci# dmesg | grep sdb > sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB) > sd 1:0:0:0: [sdb] Write Protect is off > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB) > sd 1:0:0:0: [sdb] Write Protect is off > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sdb: sdb1 sdb2 > sd 1:0:0:0: [sdb] Attached SCSI disk > md: bind > md: kicking non-fresh sdb1 from array! > md: unbind > md: export_rdev(sdb1) > md: bind > [EMAIL PROTECTED]:/home/illsci# dmesg | grep sda > sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB) > sd 0:0:0:0: [sda] Write Protect is off > sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 > sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't > support DPO or FUA > sda: sda1 sda2 > sd 0:0:0:0: [sda] Attached SCSI disk > md: bind > md: bind > md: kicking non-fresh sda2 from array! > md: unbind > md: export_rdev(sda2) > > [EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sda > smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 > Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > Device: ATA Hitachi HDS72101 Version: GKAO > Serial number: GTJ000PAG2HZUC > Device type: disk > Local Time is: Wed Dec 19 14:13:47 2007 EST > Device does not support SMART > > Error Counter logging not supported > > [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] > Device does not support Self Test logging > [EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sdb > smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 > Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > Device: ATA Hitachi HDS7
Re: help diagnosing bad disk
We'll here's the rest of the info I should have sent in the last email: [EMAIL PROTECTED]:/home/illsci# cat /proc/mdstat Personalities : [multipath] [raid1] md1 : active raid1 sdb2[1] 974808064 blocks [2/1] [_U] md0 : active raid1 sda1[0] 1951744 blocks [2/1] [U_] unused devices: [EMAIL PROTECTED]:/home/illsci# dmesg | grep sdb sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 1953523055 512-byte hardware sectors (1000204 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sdb2 sd 1:0:0:0: [sdb] Attached SCSI disk md: bind md: kicking non-fresh sdb1 from array! md: unbind md: export_rdev(sdb1) md: bind [EMAIL PROTECTED]:/home/illsci# dmesg | grep sda sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 1953523055 512-byte hardware sectors (1000204 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk md: bind md: bind md: kicking non-fresh sda2 from array! md: unbind md: export_rdev(sda2) [EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sda smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA Hitachi HDS72101 Version: GKAO Serial number: GTJ000PAG2HZUC Device type: disk Local Time is: Wed Dec 19 14:13:47 2007 EST Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging [EMAIL PROTECTED]:/home/illsci# smartctl -a /dev/sdb smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA Hitachi HDS72101 Version: GKAO Serial number: GTJ000PAG2K43C Device type: disk Local Time is: Wed Dec 19 14:13:49 2007 EST Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging On Dec 19, 2007 2:16 PM, Bill Davidsen <[EMAIL PROTECTED]> wrote: > > Jon Sabo wrote: > > So I was trying to copy over some Indiana Jones wav files and it > > wasn't going my way. I noticed that my software raid device showed: > > > > /dev/md1 on / type ext3 (rw,errors=remount-ro) > > > > Is this saying that it was remounted, read only because it found a > > problem with the md1 meta device? That's what it looks like it's > > saying but I can still write to /. > > > > mdadm --detail showed: > > > > [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 > > /dev/md0: > > Version : 00.90.03 > > Creation Time : Mon Jul 30 21:47:14 2007 > > Raid Level : raid1 > > Array Size : 1951744 ( 1906.32 MiB 1998.59 MB) > > Device Size : 1951744 (1906.32 MiB 1998.59 MB) > >Raid Devices : 2 > > Total Devices : 1 > > Preferred Minor : 0 > > Persistence : Superblock is persistent > > > > Update Time : Wed Dec 19 12:59:56 2007 > > State : clean, degraded > > Active Devices : 1 > > Working Devices : 1 > > Failed Devices : 0 > > Spare Devices : 0 > > > >UUID : 157f716c:0e7aebca:c20741f6 > > :bb6099c9 > > Events : 0.28 > > > > Number Major Minor RaidDevice State > >0 810 active sync /dev/sda1 > >1 001 removed > > > > [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 > > /dev/md1: > > Version : 00.90.03 > > Creation Time : Mon Jul 30 21:47:47 2007 > > Raid Level : raid1 > > Array Size : 974808064 (929.65 GiB 998.20 GB) > > Device Size : 974808064 (929.65 GiB 998.20 GB) > > Raid Devices : 2 > > Total Devices : 1 > > Preferred Minor : 1 > > Persistence : Superblock is persistent > > > > Update Time : Wed Dec 19 13:14:53 2007 > > State : clean, degraded > > Active Devices : 1 > > Working Devices : 1 > > Failed Devices : 0 > > Spare Devices : 0 > > > >UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 > > Events : 0.1990 > > > > Number Major Minor RaidDevice State > >0 820 active sync /dev/sda2 > >1 001 removed > > > > > > I have two 1 terabyte sata drives in this box. From what I was > > reading wouldn't i
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
Justin Piszcz wrote: On Wed, 19 Dec 2007, Bill Davidsen wrote: Justin Piszcz wrote: On Wed, 19 Dec 2007, Mattias Wadenstein wrote: On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein Is there any downside to doing that? I remember when I had to take my machine apart for a BIOS downgrade when I plugged in the sata devices again I did not plug them back in the same order, everything worked of course but when I ran LILO it said it was not part of the RAID set, because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, if I had not used partitions here, I'd have lost (or more of the drives) due to a bad LILO run? As other posts have detailed, putting the partition on a 64k aligned boundary can address the performance problems. However, a poor choice of chunk size, cache_buffer size, or just random i/o in small sizes can eat up a lot of the benefit. I don't think you need to give up your partitions to get the benefit of alignment. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark Hrmm.. I am doing a benchmark now with: 6 x 400GB (SATA) / 256 KiB stripe with unaligned vs. aligned raid setup. unligned, just fdisk /dev/sdc, mkpartition, fd raid. aligned, fdisk, expert, start at 512 as the off-set Per a Microsoft KB: Example of alignment calculations in kilobytes for a 256-KB stripe unit size: (63 * .5) / 256 = 0.123046875 (64 * .5) / 256 = 0.125 (128 * .5) / 256 = 0.25 (256 * .5) / 256 = 0.5 (512 * .5) / 256 = 1 These examples shows that the partition is not aligned correctly for a 256-KB stripe unit size until the partition is created by using an offset of 512 sectors (512 bytes per sector). So I should start at 512 for a 256k chunk size. I ran bonnie++ three consecutive times and took the average for the unaligned, rebuilding the RAID5 now and then I will re-execute the test 3 additional times and take the average of that. I'm going to try another approach, I'll describe it when I get results (or not). -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help diagnosing bad disk
Jon Sabo wrote: So I was trying to copy over some Indiana Jones wav files and it wasn't going my way. I noticed that my software raid device showed: /dev/md1 on / type ext3 (rw,errors=remount-ro) Is this saying that it was remounted, read only because it found a problem with the md1 meta device? That's what it looks like it's saying but I can still write to /. mdadm --detail showed: [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:14 2007 Raid Level : raid1 Array Size : 1951744 ( 1906.32 MiB 1998.59 MB) Device Size : 1951744 (1906.32 MiB 1998.59 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 19 12:59:56 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 157f716c:0e7aebca:c20741f6 :bb6099c9 Events : 0.28 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 001 removed [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:47 2007 Raid Level : raid1 Array Size : 974808064 (929.65 GiB 998.20 GB) Device Size : 974808064 (929.65 GiB 998.20 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:14:53 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 Events : 0.1990 Number Major Minor RaidDevice State 0 820 active sync /dev/sda2 1 001 removed I have two 1 terabyte sata drives in this box. From what I was reading wouldn't it show an F for the failed drive? I thought I would see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F. What is this saying and how do you know that its /dev/sdb and not some other drive? It shows removed and that the state is clean, degraded. Is that something you can recover from with out returning this disk and putting in a new one to add to the raid1 array? You can try adding the partitions back to your array, but I suspect something bad has happened to your sdb drive, since it's failed out of both arrays. You can use dmesg to look for any additional information. Justin gave you the rest of the info you need to investigate, I'll not repeat it. ;-) -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help diagnosing bad disk
I found the problem. The power was unplugged from the drive. The sata power connectors aren't very good at securing the connector. I reattached the power connector to the sata drive and booted up. This is what it looks like now: [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:14 2007 Raid Level : raid1 Array Size : 1951744 (1906.32 MiB 1998.59 MB) Device Size : 1951744 (1906.32 MiB 1998.59 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:48:12 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 157f716c:0e7aebca:c20741f6:bb6099c9 Events : 0.44 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 001 removed [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:47 2007 Raid Level : raid1 Array Size : 974808064 (929.65 GiB 998.20 GB) Device Size : 974808064 (929.65 GiB 998.20 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:50:02 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 Events : 0.1498340 Number Major Minor RaidDevice State 0 000 removed 1 8 181 active sync /dev/sdb2 How do I put it back into the correct state? Thanks! Jonathan On Dec 19, 2007 1:23 PM, Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > > On Wed, 19 Dec 2007, Jon Sabo wrote: > > > So I was trying to copy over some Indiana Jones wav files and it > > wasn't going my way. I noticed that my software raid device showed: > > > > /dev/md1 on / type ext3 (rw,errors=remount-ro) > > > > Is this saying that it was remounted, read only because it found a > > problem with the md1 meta device? That's what it looks like it's > > saying but I can still write to /. > > > > mdadm --detail showed: > > > > [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 > > /dev/md0: > >Version : 00.90.03 > > Creation Time : Mon Jul 30 21:47:14 2007 > > Raid Level : raid1 > > Array Size : 1951744 ( 1906.32 MiB 1998.59 MB) > >Device Size : 1951744 (1906.32 MiB 1998.59 MB) > > Raid Devices : 2 > > Total Devices : 1 > > Preferred Minor : 0 > >Persistence : Superblock is persistent > > > >Update Time : Wed Dec 19 12:59:56 2007 > > State : clean, degraded > > Active Devices : 1 > > Working Devices : 1 > > Failed Devices : 0 > > Spare Devices : 0 > > > > UUID : 157f716c:0e7aebca:c20741f6 > > :bb6099c9 > > Events : 0.28 > > > > Number Major Minor RaidDevice State > > 0 810 active sync /dev/sda1 > > 1 001 removed > > > > [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 > > /dev/md1: > >Version : 00.90.03 > > Creation Time : Mon Jul 30 21:47:47 2007 > > Raid Level : raid1 > > Array Size : 974808064 (929.65 GiB 998.20 GB) > >Device Size : 974808064 (929.65 GiB 998.20 GB) > >Raid Devices : 2 > > Total Devices : 1 > > Preferred Minor : 1 > >Persistence : Superblock is persistent > > > >Update Time : Wed Dec 19 13:14:53 2007 > > State : clean, degraded > > Active Devices : 1 > > Working Devices : 1 > > Failed Devices : 0 > > Spare Devices : 0 > > > > UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 > > Events : 0.1990 > > > >Number Major Minor RaidDevice State > > 0 820 active sync /dev/sda2 > > 1 001 removed > > > > > > I have two 1 terabyte sata drives in this box. From what I was > > reading wouldn't it show an F for the failed drive? I thought I would > > see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F. > > What is this saying and how do you know that its /dev/sdb and not some > > other drive? It shows removed and that the state is clean, degraded. > > Is that something you can recover from with out returning this disk > > and putting in a new one to add to the raid1 array? > > mdadm /dev/md1 -a /dev/sdb2 to re-add it back into the array > > What does cat /proc/mdstat show? > > I would also show us: smartctl -a /dev/sdb > > Justin. > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: help diagnosing bad disk
On Wed, 19 Dec 2007, Jon Sabo wrote: So I was trying to copy over some Indiana Jones wav files and it wasn't going my way. I noticed that my software raid device showed: /dev/md1 on / type ext3 (rw,errors=remount-ro) Is this saying that it was remounted, read only because it found a problem with the md1 meta device? That's what it looks like it's saying but I can still write to /. mdadm --detail showed: [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:14 2007 Raid Level : raid1 Array Size : 1951744 ( 1906.32 MiB 1998.59 MB) Device Size : 1951744 (1906.32 MiB 1998.59 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 19 12:59:56 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 157f716c:0e7aebca:c20741f6 :bb6099c9 Events : 0.28 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 001 removed [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:47 2007 Raid Level : raid1 Array Size : 974808064 (929.65 GiB 998.20 GB) Device Size : 974808064 (929.65 GiB 998.20 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:14:53 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 Events : 0.1990 Number Major Minor RaidDevice State 0 820 active sync /dev/sda2 1 001 removed I have two 1 terabyte sata drives in this box. From what I was reading wouldn't it show an F for the failed drive? I thought I would see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F. What is this saying and how do you know that its /dev/sdb and not some other drive? It shows removed and that the state is clean, degraded. Is that something you can recover from with out returning this disk and putting in a new one to add to the raid1 array? mdadm /dev/md1 -a /dev/sdb2 to re-add it back into the array What does cat /proc/mdstat show? I would also show us: smartctl -a /dev/sdb Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
help diagnosing bad disk
So I was trying to copy over some Indiana Jones wav files and it wasn't going my way. I noticed that my software raid device showed: /dev/md1 on / type ext3 (rw,errors=remount-ro) Is this saying that it was remounted, read only because it found a problem with the md1 meta device? That's what it looks like it's saying but I can still write to /. mdadm --detail showed: [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:14 2007 Raid Level : raid1 Array Size : 1951744 ( 1906.32 MiB 1998.59 MB) Device Size : 1951744 (1906.32 MiB 1998.59 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Dec 19 12:59:56 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 157f716c:0e7aebca:c20741f6 :bb6099c9 Events : 0.28 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 001 removed [EMAIL PROTECTED]:/home/illsci# mdadm --detail /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Mon Jul 30 21:47:47 2007 Raid Level : raid1 Array Size : 974808064 (929.65 GiB 998.20 GB) Device Size : 974808064 (929.65 GiB 998.20 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Dec 19 13:14:53 2007 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 156a030e:9a6f8eb3:9b0c439e:d718e744 Events : 0.1990 Number Major Minor RaidDevice State 0 820 active sync /dev/sda2 1 001 removed I have two 1 terabyte sata drives in this box. From what I was reading wouldn't it show an F for the failed drive? I thought I would see that /dev/sdb1 and /dev/sdb2 were failed and it would show an F. What is this saying and how do you know that its /dev/sdb and not some other drive? It shows removed and that the state is clean, degraded. Is that something you can recover from with out returning this disk and putting in a new one to add to the raid1 array? Thanks, Jonathan - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Bill Davidsen wrote: Justin Piszcz wrote: On Wed, 19 Dec 2007, Mattias Wadenstein wrote: On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein Is there any downside to doing that? I remember when I had to take my machine apart for a BIOS downgrade when I plugged in the sata devices again I did not plug them back in the same order, everything worked of course but when I ran LILO it said it was not part of the RAID set, because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, if I had not used partitions here, I'd have lost (or more of the drives) due to a bad LILO run? As other posts have detailed, putting the partition on a 64k aligned boundary can address the performance problems. However, a poor choice of chunk size, cache_buffer size, or just random i/o in small sizes can eat up a lot of the benefit. I don't think you need to give up your partitions to get the benefit of alignment. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark Hrmm.. I am doing a benchmark now with: 6 x 400GB (SATA) / 256 KiB stripe with unaligned vs. aligned raid setup. unligned, just fdisk /dev/sdc, mkpartition, fd raid. aligned, fdisk, expert, start at 512 as the off-set Per a Microsoft KB: Example of alignment calculations in kilobytes for a 256-KB stripe unit size: (63 * .5) / 256 = 0.123046875 (64 * .5) / 256 = 0.125 (128 * .5) / 256 = 0.25 (256 * .5) / 256 = 0.5 (512 * .5) / 256 = 1 These examples shows that the partition is not aligned correctly for a 256-KB stripe unit size until the partition is created by using an offset of 512 sectors (512 bytes per sector). So I should start at 512 for a 256k chunk size. I ran bonnie++ three consecutive times and took the average for the unaligned, rebuilding the RAID5 now and then I will re-execute the test 3 additional times and take the average of that. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On 12/19/07, Bill Davidsen <[EMAIL PROTECTED]> wrote: > As other posts have detailed, putting the partition on a 64k aligned > boundary can address the performance problems. However, a poor choice of > chunk size, cache_buffer size, or just random i/o in small sizes can eat > up a lot of the benefit. > > I don't think you need to give up your partitions to get the benefit of > alignment. How might that benefit be realized? Assume I have 3 disks, /dev/sd{b,c,d} all partitioned identically with 4 partitions, and I want to use /dev/sd{b,c,d}3 for a new SW raid. What sequence of steps can I take to ensure that my raid is aligned on a 64K boundary? What effect do the different superblock formats have, if any, in this situation? -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On 12/19/07, Bill Davidsen <[EMAIL PROTECTED]> wrote: > As other posts have detailed, putting the partition on a 64k aligned > boundary can address the performance problems. However, a poor choice of > chunk size, cache_buffer size, or just random i/o in small sizes can eat > up a lot of the benefit. > > I don't think you need to give up your partitions to get the benefit of > alignment. How might that benefit be realized? Assume I have 3 disks, /dev/sd{b,c,d} all partitioned identically with 4 partitions, and I want to use /dev/sd{b,c,d}3 for a new SW raid. What sequence of steps can I take to ensure that my raid is aligned on a 64K boundary? What effect do the different superblock formats have, if any, in this situation? -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
Justin Piszcz wrote: On Wed, 19 Dec 2007, Mattias Wadenstein wrote: On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein Is there any downside to doing that? I remember when I had to take my machine apart for a BIOS downgrade when I plugged in the sata devices again I did not plug them back in the same order, everything worked of course but when I ran LILO it said it was not part of the RAID set, because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, if I had not used partitions here, I'd have lost (or more of the drives) due to a bad LILO run? As other posts have detailed, putting the partition on a 64k aligned boundary can address the performance problems. However, a poor choice of chunk size, cache_buffer size, or just random i/o in small sizes can eat up a lot of the benefit. I don't think you need to give up your partitions to get the benefit of alignment. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Jon Nelson wrote: On 12/19/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: On Wed, 19 Dec 2007, Mattias Wadenstein wrote: From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein Is there any downside to doing that? I remember when I had to take my There is one (just pointed out to me yesterday): having the partition and having it labeled as raid makes identification quite a bit easier for humans and software, too. -- Jon Some nice graphs found here: http://sqlblog.com/blogs/linchi_shea/archive/2007/02/01/performance-impact-of-disk-misalignment.aspx - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ERROR] scsi.c: In function 'scsi_get_serial_number_page'
Thierry Iceta wrote: Hi I would like to use raidtools-1.00.3 on Rhel5 distribution but I got thie error Could you tell me if a new version is available or if a patch exists to use raidtools on Rhel5 raidtools is old and unmaintained. Use mdadm. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid over 48 disks
Mattias Wadenstein wrote: On Wed, 19 Dec 2007, Neil Brown wrote: On Tuesday December 18, [EMAIL PROTECTED] wrote: We're investigating the possibility of running Linux (RHEL) on top of Sun's X4500 Thumper box: http://www.sun.com/servers/x64/x4500/ Basically, it's a server with 48 SATA hard drives. No hardware RAID. It's designed for Sun's ZFS filesystem. So... we're curious how Linux will handle such a beast. Has anyone run MD software RAID over so many disks? Then piled LVM/ext3 on top of that? Any suggestions? There are those that have run Linux MD RAID on thumpers before. I vaguely recall some driver issues (unrelated to MD) that made it less suitable than solaris, but that might be fixed in recent kernels. Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use RAID0 to combine them together. This would give you adequate reliability and performance and still a large amount of storage space. My personal suggestion would be 5 9-disk raid6s, one raid1 root mirror and one hot spare. Then raid0, lvm, or separate filesystem on those 5 raidsets for data, depending on your needs. Other than thinking raid-10 better than raid-1for performance, I like it. You get almost as much data space as with the 6 8-disk raid6s, and have a separate pair of disks for all the small updates (logging, metadata, etc), so this makes alot of sense if most of the data is bulk file access. -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On 12/19/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: > > > On Wed, 19 Dec 2007, Mattias Wadenstein wrote: > >> From that setup it seems simple, scrap the partition table and use the > > disk device for raid. This is what we do for all data storage disks (hw > > raid) > > and sw raid members. > > > > /Mattias Wadenstein > > > > Is there any downside to doing that? I remember when I had to take my There is one (just pointed out to me yesterday): having the partition and having it labeled as raid makes identification quite a bit easier for humans and software, too. -- Jon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Mattias Wadenstein wrote: On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein Is there any downside to doing that? I remember when I had to take my machine apart for a BIOS downgrade when I plugged in the sata devices again I did not plug them back in the same order, everything worked of course but when I ran LILO it said it was not part of the RAID set, because /dev/sda had become /dev/sdg and overwrote the MBR on the disk, if I had not used partitions here, I'd have lost (or more of the drives) due to a bad LILO run? Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid over 48 disks
Thiemo Nagel wrote: Performance of the raw device is fair: # dd if=/dev/md2 of=/dev/zero bs=128k count=64k 8589934592 bytes (8.6 GB) copied, 15.6071 seconds, 550 MB/s Somewhat less through ext3 (created with -E stride=64): # dd if=largetestfile of=/dev/zero bs=128k count=64k 8589934592 bytes (8.6 GB) copied, 26.4103 seconds, 325 MB/s Quite slow? 10 disks (raptors) raid 5 on regular sata controllers: # dd if=/dev/md3 of=/dev/zero bs=128k count=64k 8589934592 bytes (8.6 GB) copied, 10.718 seconds, 801 MB/s # dd if=bigfile of=/dev/zero bs=128k count=64k 3640379392 bytes (3.6 GB) copied, 6.58454 seconds, 553 MB/s Interesting. Any ideas what could be the reason? How much do you get from a single drive? -- The Samsung HD501LJ that I'm using gives ~84MB/s when reading from the beginning of the disk. With RAID 5 I'm getting slightly better results (though I really wonder why, since naively I would expect identical read performance) but that does only account for a small part of the difference: 16k read64k write chunk sizeRAID 5RAID 6RAID 5RAID 6 128k492497268270 256k615530288270 512k625607230174 1024k 65062017075 What is your stripe cache size? -- Bill Davidsen <[EMAIL PROTECTED]> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid over 48 disks
On Wed, 19 Dec 2007, Bill Davidsen wrote: Thiemo Nagel wrote: Performance of the raw device is fair: # dd if=/dev/md2 of=/dev/zero bs=128k count=64k 8589934592 bytes (8.6 GB) copied, 15.6071 seconds, 550 MB/s Somewhat less through ext3 (created with -E stride=64): # dd if=largetestfile of=/dev/zero bs=128k count=64k 8589934592 bytes (8.6 GB) copied, 26.4103 seconds, 325 MB/s Quite slow? 10 disks (raptors) raid 5 on regular sata controllers: # dd if=/dev/md3 of=/dev/zero bs=128k count=64k 8589934592 bytes (8.6 GB) copied, 10.718 seconds, 801 MB/s # dd if=bigfile of=/dev/zero bs=128k count=64k 3640379392 bytes (3.6 GB) copied, 6.58454 seconds, 553 MB/s Interesting. Any ideas what could be the reason? How much do you get from a single drive? -- The Samsung HD501LJ that I'm using gives ~84MB/s when reading from the beginning of the disk. With RAID 5 I'm getting slightly better results (though I really wonder why, since naively I would expect identical read performance) but that does only account for a small part of the difference: 16k read64k write chunk sizeRAID 5RAID 6RAID 5RAID 6 128k492497268270 256k615530288270 512k625607230174 1024k 65062017075 What is your stripe cache size? # Set stripe-cache_size for RAID5. echo "Setting stripe_cache_size to 16 MiB for /dev/md3" echo 16384 > /sys/block/md3/md/stripe_cache_size Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux RAID Partition Offset 63 cylinders / 30% performance hit?
On Wed, 19 Dec 2007, Justin Piszcz wrote: -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? From that setup it seems simple, scrap the partition table and use the disk device for raid. This is what we do for all data storage disks (hw raid) and sw raid members. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Linux RAID Partition Offset 63 cylinders / 30% performance hit?
The (up to) 30% percent figure is mentioned here: http://insights.oetiker.ch/linux/raidoptimization.html On http://forums.storagereview.net/index.php?showtopic=25786: This user writes about the problem: XP, and virtually every O/S and partitioning software of XP's day, by default places the first partition on a disk at sector 63. Being an odd number, and 31.5KB into the drive, it isn't ever going to align with any stripe size. This is an unfortunate industry standard. Vista on the other hand, aligns the first partition on sector 2048 by default as a by-product of it's revisions to support large-sector sized hard drives. As RAID5 arrays in write mode mimick the performance characteristics of large-sector size hard drives, this comes as a great if not inadvertent benefit. 2048 is evenly divisible by 2 and 4 (allowing for 3 and 5 drive arrays optimally) and virtually every stripe size in common use. If you are however using a 4-drive RAID5, you're SOOL. Page 9 in this PDF (EMC_BestPractice_R22.pdf) shows the problem graphically: http://bbs.doit.com.cn/attachment.php?aid=6757 -- Now to my setup / question: # fdisk -l /dev/sdc Disk /dev/sdc: 150.0 GB, 150039945216 bytes 255 heads, 63 sectors/track, 18241 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x5667c24a Device Boot Start End Blocks Id System /dev/sdc1 1 18241 146520801 fd Linux raid autodetect --- If I use 10-disk RAID5 with 1024 KiB stripe, what would be the correct start and end size if I wanted to make sure the RAID5 was stripe aligned? Or is there a better way to do this, does parted handle this situation better? What is the best (and correct) way to calculate stripe-alignment on the RAID5 device itself? --- The EMC paper recommends: Disk partition adjustment for Linux systems In Linux, align the partition table before data is written to the LUN, as the partition map will be rewritten and all data on the LUN destroyed. In the following example, the LUN is mapped to /dev/emcpowerah, and the LUN stripe element size is 128 blocks. Arguments for the fdisk utility are as follows: fdisk/dev/emcpowerah x # expert mode b # adjust starting block number 1 # choose partition 1 128 #set it to 128, our stripe element size w # write the new partition --- Does this also apply to Linux/SW RAID5? Or are there any caveats that are not taken into account since it is based in SW vs. HW? --- What it currently looks like: Command (m for help): x Expert command (m for help): p Disk /dev/sdc: 255 heads, 63 sectors, 18241 cylinders Nr AF Hd Sec Cyl Hd Sec Cyl Start Size ID 1 00 1 10 254 63 1023 63 293041602 fd 2 00 0 00 0 00 0 0 00 3 00 0 00 0 00 0 0 00 4 00 0 00 0 00 0 0 00 Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid over 48 disks
Guy Watkins wrote: } -Original Message- } From: [EMAIL PROTECTED] [mailto:linux-raid- } [EMAIL PROTECTED] On Behalf Of Brendan Conoboy } Sent: Tuesday, December 18, 2007 3:36 PM } To: Norman Elton } Cc: linux-raid@vger.kernel.org } Subject: Re: Raid over 48 disks } } Norman Elton wrote: } > We're investigating the possibility of running Linux (RHEL) on top of } > Sun's X4500 Thumper box: } > } > http://www.sun.com/servers/x64/x4500/ } } Neat- 6 8 port SATA controllers! It'll be worth checking to be sure } each controller has equal bandwidth. If some controllers are on slower } buses than others you may want to consider that and balance the md } device layout. Assuming the 6 controllers are equal, I would make 3 16 disk RAID6 arrays using 2 disks from each controller. That way any 1 controller can fail and your system will still be running. 6 disks will be used for redundancy. Or 6 8 disk RAID6 arrays using 1 disk from each controller). That way any 2 controllers can fail and your system will still be running. 12 disks will be used for redundancy. Might be too excessive! Combine them into a RAID0 array. Guy Sounds interesting! Just out of interest, whats stopping you from using Solaris? Though, I'm curious how md will compare to ZFS performance wise. There is some interesting configuration info / advice for Solaris here: http://www.solarisinternals.com/wiki/index.php/ZFS_Configuration_Guide esp for the X4500. Russell - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 resizing
On Wed, Dec 19, 2007 at 10:59:41PM +1100, Neil Brown wrote: > On Wednesday December 19, [EMAIL PROTECTED] wrote: > > Hi, > > > > I'm thinking of slowly replacing disks in my raid5 array with bigger > > disks and then resize the array to fill up the new disks. Is this > > possible? Basically I would like to go from: > > > > 3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of > > storage. > > > > It seems like it should be, but... :) > > Yes. > > mdadm --grow /dev/mdX --size=max Oh -joy-. I love linux sw raid. :) The only thing it seems to lack is battery backed-up cache. Thank you. -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ERROR] scsi.c: In function 'scsi_get_serial_number_page'
Thierry Iceta wrote: > Hi > > I would like to use raidtools-1.00.3 on Rhel5 distribution > but I got thie error Use mdadm instead. Raidtools is dangerous/unsafe, and is not maintained for a long time already. /mjt - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 resizing
On Wednesday December 19, [EMAIL PROTECTED] wrote: > Hi, > > I'm thinking of slowly replacing disks in my raid5 array with bigger > disks and then resize the array to fill up the new disks. Is this > possible? Basically I would like to go from: > > 3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of > storage. > > It seems like it should be, but... :) Yes. mdadm --grow /dev/mdX --size=max NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ERROR] scsi.c: In function 'scsi_get_serial_number_page'
Hi I would like to use raidtools-1.00.3 on Rhel5 distribution but I got thie error Could you tell me if a new version is available or if a patch exists to use raidtools on Rhel5 Thanks for your answer Thierry gcc -O2 -Wall -DMD_VERSION=\""raidtools-1.00.3"\" -c -o rrc_common.o rrc_common.c raid_io.c:96: error: expected declaration specifiers or â before â raid_io.c:96: error: expected declaration specifiers or â before â raid_io.c:96: error: expected declaration specifiers or â before â raid_io.c:97: error: expected declaration specifiers or â before â raid_io.c:97: error: expected declaration specifiers or â before â raid_io.c:98: error: expected declaration specifiers or â before â raid_io.c:101: warning: return type defaults to â raid_io.c: In function â: raid_io.c:102: error: expected â, â, â, â or â before â token raid_io.c:119: error: expected â, â, â, â or â before â token raid_io.c:214: error: expected â, â, â, â or â before â token raid_io.c:267: error: expected â, â, â, â or â before â token raid_io.c:361: error: expected â, â, â, â or â before â token raid_io.c:519: error: expected â, â, â, â or â before â token raid_io.c:96: error: parameter name omitted raid_io.c:96: error: parameter name omitted raid_io.c:96: error: parameter name omitted raid_io.c:97: error: parameter name omitted raid_io.c:97: error: parameter name omitted raid_io.c:98: error: parameter name omitted raid_io.c:539: error: expected â at end of input make: *** [raid_io.o] Error 1 make: *** Waiting for unfinished jobs scsi.c: In function â: scsi.c:434: warning: pointer targets in passing argument 2 of â differ in signedness -- __ Bull, Architect of an Open World TM Open Software R&D Email :[EMAIL PROTECTED] Bull SABullcom :229 76 29 1, rue de Provence Phone :+33 04 76 29 76 29 B.P. 208 http://www.bull.com 38432 Echirolles-CEDEX Office :FREC B1-361 __ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 resizing
Hi, I'm thinking of slowly replacing disks in my raid5 array with bigger disks and then resize the array to fill up the new disks. Is this possible? Basically I would like to go from: 3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from 1tb to 2tb of storage. It seems like it should be, but... :) -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid over 48 disks
On Wed, 19 Dec 2007, Neil Brown wrote: On Tuesday December 18, [EMAIL PROTECTED] wrote: We're investigating the possibility of running Linux (RHEL) on top of Sun's X4500 Thumper box: http://www.sun.com/servers/x64/x4500/ Basically, it's a server with 48 SATA hard drives. No hardware RAID. It's designed for Sun's ZFS filesystem. So... we're curious how Linux will handle such a beast. Has anyone run MD software RAID over so many disks? Then piled LVM/ext3 on top of that? Any suggestions? There are those that have run Linux MD RAID on thumpers before. I vaguely recall some driver issues (unrelated to MD) that made it less suitable than solaris, but that might be fixed in recent kernels. Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use RAID0 to combine them together. This would give you adequate reliability and performance and still a large amount of storage space. My personal suggestion would be 5 9-disk raid6s, one raid1 root mirror and one hot spare. Then raid0, lvm, or separate filesystem on those 5 raidsets for data, depending on your needs. You get almost as much data space as with the 6 8-disk raid6s, and have a separate pair of disks for all the small updates (logging, metadata, etc), so this makes alot of sense if most of the data is bulk file access. /Mattias Wadenstein - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html