Re: How many drives are bad?

2008-02-20 Thread Neil Brown
On Tuesday February 19, [EMAIL PROTECTED] wrote: So I had my first failure today, when I got a report that one drive (/dev/sdam) failed. I've attached the output of mdadm --detail. It appears that two drives are listed as removed, but the array is still functioning. What does this mean? How

Re: suns raid-z / zfs

2008-02-18 Thread Neil Brown
On Monday February 18, [EMAIL PROTECTED] wrote: On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote: On Sunday February 17, [EMAIL PROTECTED] wrote: Hi It seems like a good way to avoid the performance problems of raid-5 /raid-6 I think there are better ways

Re: RAID5 to RAID6 reshape?

2008-02-17 Thread Neil Brown
On Sunday February 17, [EMAIL PROTECTED] wrote: On Sun, 17 Feb 2008 14:31:22 +0100 Janek Kozicki [EMAIL PROTECTED] wrote: oh, right - Sevrin Robstad has a good idea to solve your problem - create raid6 with one missing member. And add this member, when you have it, next year or such.

Re: RAID5 to RAID6 reshape?

2008-02-17 Thread Neil Brown
On Saturday February 16, [EMAIL PROTECTED] wrote: found was a few months old. Is it likely that RAID5 to RAID6 reshaping will be implemented in the next 12 to 18 months (my rough Certainly possible. I won't say it is likely until it is actually done. And by then it will be definite :-) i.e.

Re: suns raid-z / zfs

2008-02-17 Thread Neil Brown
On Sunday February 17, [EMAIL PROTECTED] wrote: Hi any opinions on suns zfs/raid-z? It's vaguely interesting. I'm not sold on the idea though. It seems like a good way to avoid the performance problems of raid-5 /raid-6 I think there are better ways. But does it stripe? One could

Re: Create Raid6 with 1 missing member fails

2008-02-17 Thread Neil Brown
On Sunday February 17, [EMAIL PROTECTED] wrote: I tried to create a raid6 with one missing member, but it fails. It works fine to create a raid6 with two missing members. Is it supposed to be like that ? No, it isn't supposed to be like that, but currently it is. The easiest approach if to

Re: raid5: two writing algorithms

2008-02-07 Thread Neil Brown
On Thursday February 7, [EMAIL PROTECTED] wrote: As I understand it, there are 2 valid algoritms for writing in raid5. 1. calculate the parity data by XOR'ing all data of the relevant data chunks. 2. calculate the parity data by kind of XOR-subtracting the old data to be changed, and then

Re: when is a disk non-fresh?

2008-02-07 Thread Neil Brown
On Thursday February 7, [EMAIL PROTECTED] wrote: On Tuesday 05 February 2008 03:02:00 Neil Brown wrote: On Monday February 4, [EMAIL PROTECTED] wrote: Seems the other topic wasn't quite clear... not necessarily. sometimes it helps to repeat your question. there is a lot of noise

Re: raid5: two writing algorithms

2008-02-07 Thread Neil Brown
On Friday February 8, [EMAIL PROTECTED] wrote: On Fri, Feb 08, 2008 at 07:25:31AM +1100, Neil Brown wrote: On Thursday February 7, [EMAIL PROTECTED] wrote: So I hereby give the idea for inspiration to kernel hackers. and I hereby invite you to read the code ;-) I did some reading

Re: Deleting mdadm RAID arrays

2008-02-06 Thread Neil Brown
On Wednesday February 6, [EMAIL PROTECTED] wrote: Maybe the kernel has been told to forget about the partitions of /dev/sdb. But fdisk/cfdisk has no problem whatsoever finding the partitions . It is looking at the partition table on disk. Not at the kernel's idea of partitions, which

Re: raid10 on three discs - few questions.

2008-02-06 Thread Neil Brown
On Wednesday February 6, [EMAIL PROTECTED] wrote: 4. Would it be possible to later '--grow' the array to use 4 discs in raid10 ? Even with far=2 ? No. Well if by later you mean in five years, then maybe. But the code doesn't currently exist. That's a

Re: Deleting mdadm RAID arrays

2008-02-06 Thread Neil Brown
On Wednesday February 6, [EMAIL PROTECTED] wrote: % cat /proc/partitions major minor #blocks name 8 0 390711384 sda 8 1 390708801 sda1 816 390711384 sdb 817 390708801 sdb1 832 390711384 sdc 833 390708801 sdc1 848 390710327 sdd

Re: recommendations for stripe/chunk size

2008-02-06 Thread Neil Brown
On Wednesday February 6, [EMAIL PROTECTED] wrote: We implemented the option to select kernel page sizes of 4, 16, 64 and 256 kB for some PowerPC systems (440SPe, to be precise). A nice graphics of the effect can be found here:

Re: recommendations for stripe/chunk size

2008-02-06 Thread Neil Brown
On Wednesday February 6, [EMAIL PROTECTED] wrote: Keld Jørn Simonsen wrote: Hi I am looking at revising our howto. I see a number of places where a chunk size of 32 kiB is recommended, and even recommendations on maybe using sizes of 4 kiB. Depending on the raid level, a write

Re: recommendations for stripe/chunk size

2008-02-06 Thread Neil Brown
On Thursday February 7, [EMAIL PROTECTED] wrote: Anyway, why does a SATA-II drive not deliver something like 300 MB/s? Are you serious? I high end 15000RPM enterprise grade drive such as the Seagate Cheetah® 15K.6 Hard Drives only deliver 164MB/sec. The SATA Bus might be able to deliver

Re: Re[2]: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-05 Thread Neil Brown
On Tuesday February 5, [EMAIL PROTECTED] wrote: Feb 5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at virtual address 001cd901 This looks like some sort of memory corruption. Feb 5 11:56:12 raid01 kernel: EIP is at md_do_sync+0x629/0xa32 This tells us what code is

Re: Deleting mdadm RAID arrays

2008-02-05 Thread Neil Brown
On Tuesday February 5, [EMAIL PROTECTED] wrote: % mdadm --zero-superblock /dev/sdb1 mdadm: Couldn't open /dev/sdb1 for write - not zeroing That's weird. Why can't it open it? Maybe you aren't running as root (The '%' prompt is suspicious). Maybe the kernel has been told to forget about the

Re: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-04 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:/# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k

Re: when is a disk non-fresh?

2008-02-04 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote: Seems the other topic wasn't quite clear... not necessarily. sometimes it helps to repeat your question. there is a lot of noise on the internet and somethings important things get missed... :-) Occasionally a disk is kicked for being non-fresh

Re: raid10 on three discs - few questions.

2008-02-03 Thread Neil Brown
On Sunday February 3, [EMAIL PROTECTED] wrote: Hi, Maybe I'll buy three HDDs to put a raid10 on them. And get the total capacity of 1.5 of a disc. 'man 4 md' indicates that this is possible and should work. I'm wondering - how a single disc failure is handled in such configuration? 1.

Re: problem with spare, acive device, clean degrated, reshaip RADI5, anybody can help ?

2008-02-03 Thread Neil Brown
On Thursday January 31, [EMAIL PROTECTED] wrote: Hello linux-raid. i have DEBIAN. raid01:/# mdadm -V mdadm - v2.6.4 - 19th October 2007 raid01:/# mdadm -D /dev/md1 /dev/md1: Version : 00.91.03 Creation Time : Tue Nov 13 18:42:36 2007 Raid Level : raid5 Delta

Re: /dev/sdb has different metadata to chosen array /dev/md1 0.91 0.90.

2008-02-03 Thread Neil Brown
On Saturday February 2, [EMAIL PROTECTED] wrote: Çäðàâñòâóéòå, linux-raid. Help please, How i can to fight THIS : [EMAIL PROTECTED]:~# mdadm -I /dev/sdb mdadm: /dev/sdb has different metadata to chosen array /dev/md1 0.91 0.90. Apparently mdadm -I doesn't work with arrays that are in

Re: Re[2]: problem with spare, acive device, clean degrated, reshaip RADI5, anybody can help ?

2008-02-03 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote: raid01:/etc# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active(auto-read-only) raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] ^^^

Re: Linux md and iscsi problems

2008-02-02 Thread Neil Brown
On Friday February 1, [EMAIL PROTECTED] wrote: Summarizing, I have two questions about the behavior of Linux md with slow devices: 1. Is it possible to modify some kind of time-out parameter on the mdadm tool so the slow device wouldn't be marked as faulty because of its slow

Re: raid problem: after every reboot /dev/sdb1 is removed?

2008-02-01 Thread Neil Brown
On Friday February 1, [EMAIL PROTECTED] wrote: Hi! I have the following problem with my softraid (raid 1). I'm running Ubuntu 7.10 64bit with kernel 2.6.22-14-generic. After every reboot my first boot partition in md0 is not synchron. One of the disks (the sdb1) is removed. After a

Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote: Hello, It seems that mdadm/md do not perform proper sanity checks before adding a component to a degraded array. If the size of the new component is just right, the superblock information will overlap with the data area. This will happen

Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote: Hello, It seems that mdadm/md do not perform proper sanity checks before adding a component to a degraded array. If the size of the new component is just right, the superblock information will overlap with the data area. This will happen

Re: In this partition scheme, grub does not find md information?

2008-01-28 Thread Neil Brown
On Monday January 28, [EMAIL PROTECTED] wrote: Perhaps I'm mistaken but I though it was possible to do boot from /dev/md/all1. It is my understanding that grub cannot boot from RAID. You can boot from raid1 by the expedient of booting from one of the halves. A common approach is to make a

Re: write-intent bitmaps

2008-01-27 Thread Neil Brown
On Sunday January 27, [EMAIL PROTECTED] wrote: http://lists.debian.org/debian-devel/2008/01/msg00921.html Are they regarded as a stable feature? If so I'd like to see distributions supporting them by default. I've started a discussion in Debian on this topic, see the above URL for

Re: striping of a 4 drive raid10

2008-01-27 Thread Neil Brown
On Sunday January 27, [EMAIL PROTECTED] wrote: Hi I have tried to make a striping raid out of my new 4 x 1 TB SATA-2 disks. I tried raid10,f2 in several ways: 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0 of md0+md1 2: md0 = raid0 of sda1+sdb1, md1= raid0 of

Re: striping of a 4 drive raid10

2008-01-27 Thread Neil Brown
On Sunday January 27, [EMAIL PROTECTED] wrote: On Mon, Jan 28, 2008 at 07:13:30AM +1100, Neil Brown wrote: On Sunday January 27, [EMAIL PROTECTED] wrote: Hi I have tried to make a striping raid out of my new 4 x 1 TB SATA-2 disks. I tried raid10,f2 in several ways: 1: md0

Re: Fwd: Error on /dev/sda, but takes down RAID-1

2008-01-23 Thread Neil Brown
On Wednesday January 23, [EMAIL PROTECTED] wrote: Hi, I'm not sure this is completely linux-raid related, but I can't figure out where to start: A few days ago, my server died. I was able to log in and salvage this content of dmesg: http://pastebin.com/m4af616df At line 194:

Re: [BUG] The kernel thread for md RAID1 could cause a md RAID1 array deadlock

2008-01-23 Thread Neil Brown
a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to K.Tanaka [EMAIL PROTECTED] for finding and reporting this problem. Cc: K.Tanaka [EMAIL PROTECTED] Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat

Re: idle array consuming cpu ??!!

2008-01-23 Thread Neil Brown
On Tuesday January 22, [EMAIL PROTECTED] wrote: Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: On Sunday January 20, [EMAIL PROTECTED] wrote: A raid6 array with a spare and bitmap is idle: not mounted and with no IO to it or any of its disks (obviously), as shown by iostat

Re: array doesn't run even with --force

2008-01-20 Thread Neil Brown
On Sunday January 20, [EMAIL PROTECTED] wrote: I've got a raid5 array with 5 disks where 2 failed. The failures are occasional and only on a few sectors so I tried to assemble it with 4 disks anyway: # mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4 However mdadm

Re: idle array consuming cpu ??!!

2008-01-20 Thread Neil Brown
On Sunday January 20, [EMAIL PROTECTED] wrote: A raid6 array with a spare and bitmap is idle: not mounted and with no IO to it or any of its disks (obviously), as shown by iostat. However it's consuming cpu: since reboot it used about 11min in 24h, which is quite a lot even for a busy array

Re: array doesn't run even with --force

2008-01-20 Thread Neil Brown
On Monday January 21, [EMAIL PROTECTED] wrote: The command is mdadm -A --verbose -f -R /dev/md3 /dev/sda4 /dev/sdc4 /dev/sde4 /dev/sdd4 The failed areas are sdb4 (which I didn't include above) and sdd4. I did a dd if=/dev/sdb4 of=/dev/hda4 bs=512 conv=noerror and it complained about

Re: do_md_run returned -22 [Was: 2.6.24-rc8-mm1]

2008-01-17 Thread Neil Brown
On Thursday January 17, [EMAIL PROTECTED] wrote: On Thu, 17 Jan 2008 16:23:30 +0100 Jiri Slaby [EMAIL PROTECTED] wrote: On 01/17/2008 11:35 AM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc8/2.6.24-rc8-mm1/ still the same md issue

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-16 Thread Neil Brown
On Tuesday January 15, [EMAIL PROTECTED] wrote: On Wed, 16 Jan 2008 00:09:31 -0700 Dan Williams [EMAIL PROTECTED] wrote: heheh. it's really easy to reproduce the hang without the patch -- i could hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB. i'll try with

Re: How do I get rid of old device?

2008-01-16 Thread Neil Brown
On Wednesday January 16, [EMAIL PROTECTED] wrote: p34:~# mdadm /dev/md3 --zero-superblock p34:~# mdadm --examine --scan ARRAY /dev/md0 level=raid1 num-devices=2 UUID=f463057c:9a696419:3bcb794a:7aaa12b2 ARRAY /dev/md1 level=raid1 num-devices=2 UUID=98e4948c:c6685f82:e082fd95:e7f45529 ARRAY

Re: [PATCH 002 of 6] md: Fix use-after-free bug when dropping an rdev from an md array.

2008-01-13 Thread Neil Brown
On Monday January 14, [EMAIL PROTECTED] wrote: On Mon, Jan 14, 2008 at 12:45:31PM +1100, NeilBrown wrote: Due to possible deadlock issues we need to use a schedule work to kobject_del an 'rdev' object from a different thread. A recent change means that kobject_add no longer gets a

Re: [PATCH 002 of 6] md: Fix use-after-free bug when dropping an rdev from an md array.

2008-01-13 Thread Neil Brown
On Monday January 14, [EMAIL PROTECTED] wrote: On Mon, Jan 14, 2008 at 02:21:45PM +1100, Neil Brown wrote: Maybe it isn't there any more Once upon a time, when I echo remove /sys/block/mdX/md/dev-YYY/state Egads. And just what will protect you from parallel callers

Re: [PATCH 002 of 6] md: Fix use-after-free bug when dropping an rdev from an md array.

2008-01-13 Thread Neil Brown
before the drop, and we are safe... Comments? NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/md.c | 35 ++- 1 file changed, 26 insertions(+), 9 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md

Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote: On Wed, Jan 09, 2008 at 07:16:34PM +1100, CaT wrote: But I suspect that --assemble --force would do the right thing. Without more details, it is hard to say for sure. I suspect so aswell but throwing caution into the wind erks me wrt

Re: md rotates RAID5 spare at boot

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote: It looks to me like md inspects and attempts to assemble after each drive controller is scanned (from dmesg, there appears to be a failed bind on the first three devices after they are scanned, and then again when the second controller is

Re: md rotates RAID5 spare at boot

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote: distro: Ubuntu 7.10 Two files show up... 85-mdadm.rules: # This file causes block devices with Linux RAID (mdadm) signatures to # automatically cause mdadm to be run. # See udev(8) for syntax SUBSYSTEM==block, ACTION==add|change,

Re: md rotates RAID5 spare at boot

2008-01-10 Thread Neil Brown
as it detects the devices. So is it possible that the spare is not the last drive to be detected and mdadm assembles too soon? Neil Brown wrote: On Thursday January 10, [EMAIL PROTECTED] wrote: It looks to me like md inspects and attempts to assemble after each drive controller

Re: md rotates RAID5 spare at boot

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote: (Sorry- yes it looks like I posted an incorrect dmesg extract) This still doesn't seem to match your description. I see: [ 41.247389] md: bindsdf1 [ 41.247584] md: bindsdb1 [ 41.247787] md: bindsda1 [ 41.247971] md: bindsdc1 [

Re: The effects of multiple layers of block drivers

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote: Hello, I am starting to dig into the Block subsystem to try and uncover the reason for some data I lost recently. My situation is that I have multiple block drivers on top of each other and am wondering how the effectss of a raid 5 rebuild

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-10 Thread Neil Brown
On Thursday January 10, [EMAIL PROTECTED] wrote: On Jan 10, 2008 12:13 AM, dean gaudet [EMAIL PROTECTED] wrote: w.r.t. dan's cfq comments -- i really don't know the details, but does this mean cfq will misattribute the IO to the wrong user/process? or is it just a concern that CPU time

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-09 Thread Neil Brown
of make_request, like this. Can you test it please? Does it seem reasonable? Thanks, NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/md.c|2 +- ./drivers/md/raid5.c |4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff .prev

Re: 2.6.24-rc6 reproducible raid5 hang

2008-01-09 Thread Neil Brown
On Wednesday January 9, [EMAIL PROTECTED] wrote: On Jan 9, 2008 5:09 PM, Neil Brown [EMAIL PROTECTED] wrote: On Wednesday January 9, [EMAIL PROTECTED] wrote: Can you test it please? This passes my failure case. Thanks! Does it seem reasonable? What do you think about limiting

Re: raid5 stuck in degraded, inactive and dirty mode

2008-01-08 Thread Neil Brown
On Wednesday January 9, [EMAIL PROTECTED] wrote: I'd provide data dumps of --examine and friends but I'm in a situation where transferring the data would be a right pain. I'll do it if need be, though. So, what can I do? Well, providing the output of --examine would help a lot. But I

Re: Raid 1, can't get the second disk added back in.

2008-01-07 Thread Neil Brown
On Monday January 7, [EMAIL PROTECTED] wrote: Problem is not raid, or at least not obviously raid related. The problem is that the whole disk, /dev/hdb is unavailable. Maybe check /sys/block/hdb/holders ? lsof /dev/hdb ? good luck :-) NeilBrown - To unsubscribe from this list: send the

Re: Why mdadm --monitor --program sometimes only gives 2 command-line arguments to the program?

2008-01-07 Thread Neil Brown
On Saturday January 5, [EMAIL PROTECTED] wrote: Hi all, I need to monitor my RAID and if it fails, I'd like to call my-script to deal with the failure. I did: mdadm --monitor --program my-script --delay 60 /dev/md1 And then, I simulate a failure with mdadm --manage --set-faulty

Re: Raid 1, new disk can't be added after replacing faulty disk

2008-01-07 Thread Neil Brown
On Monday January 7, [EMAIL PROTECTED] wrote: On Jan 7, 2008 6:44 AM, Radu Rendec [EMAIL PROTECTED] wrote: I'm experiencing trouble when trying to add a new disk to a raid 1 array after having replaced a faulty disk. [..] # mdadm --version mdadm - v2.6.2 - 21st May 2007 [..]

Re: Raid 1, can't get the second disk added back in.

2008-01-06 Thread Neil Brown
On Saturday January 5, [EMAIL PROTECTED] wrote: Since /dev/hdb5 has been part of this array before you should use --re-add instead of --add. Kind regards, Alex. That is not correct. --re-add is only needed for arrays without metadata, for which you use --build to start them. NeilBrown -

Re: Raid 1, can't get the second disk added back in.

2008-01-06 Thread Neil Brown
On Saturday January 5, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:~# mdadm /dev/md0 --add /dev/hdb5 mdadm: Cannot open /dev/hdb5: Device or resource busy All the solutions I've been able to google fail with the busy. There is nothing that I can find that might be using /dev/hdb5 except

Re: stopped array, but /sys/block/mdN still exists.

2008-01-03 Thread Neil Brown
On Thursday January 3, [EMAIL PROTECTED] wrote: So what happens if I try to _use_ that /sys entry? For instance run a script which reads data, or sets the stripe_cache_size higher, or whatever? Do I get back status, ignored, or system issues? Try it:-) The stripe_cache_size attributes

Re: PROBLEM: RAID5 reshape data corruption

2008-01-03 Thread Neil Brown
: Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid5.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2008-01-04 09:20:54.0 +1100 +++ ./drivers/md/raid5.c2008

Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Neil Brown
a linked chain of asynchronous operations. --- From: Neil Brown [EMAIL PROTECTED] Technically that should probably be From: Dan Williams [EMAIL PROTECTED] now, and then I add Acked-by: NeilBrown [EMAIL PROTECTED] because I completely agree with your improvement. We should keep an eye out

Re: stopped array, but /sys/block/mdN still exists.

2008-01-02 Thread Neil Brown
On Wednesday January 2, [EMAIL PROTECTED] wrote: This isn't a high priority issue or anything, but I'm curious: I --stop(ped) an array but /sys/block/md2 remained largely populated. Is that intentional? It is expected. Because of the way that md devices are created (just open the

Re: Last ditch plea on remote double raid5 disk failure

2007-12-31 Thread Neil Brown
On Monday December 31, [EMAIL PROTECTED] wrote: I'm hoping that if I can get raid5 to continue despite the errors, I can bring back up enough of the server to continue, a bit like the remount-ro option in ext2/ext3. If not, oh well... Sorry, but it is oh well. I could probably make it

Re: mdadm --stop goes off and never comes back?

2007-12-22 Thread Neil Brown
On Wednesday December 19, [EMAIL PROTECTED] wrote: On 12/19/07, Jon Nelson [EMAIL PROTECTED] wrote: On 12/19/07, Neil Brown [EMAIL PROTECTED] wrote: On Tuesday December 18, [EMAIL PROTECTED] wrote: I tried to stop the array: mdadm --stop /dev/md2 and mdadm never came

Re: raid5 resizing

2007-12-19 Thread Neil Brown
On Wednesday December 19, [EMAIL PROTECTED] wrote: Hi, I'm thinking of slowly replacing disks in my raid5 array with bigger disks and then resize the array to fill up the new disks. Is this possible? Basically I would like to go from: 3 x 500gig RAID5 to 3 x 1tb RAID5, thereby going from

Re: mdadm --stop goes off and never comes back?

2007-12-19 Thread Neil Brown
On Tuesday December 18, [EMAIL PROTECTED] wrote: This just happened to me. Create raid with: mdadm --create /dev/md2 --level=raid10 --raid-devices=3 --spare-devices=0 --layout=o2 /dev/sdb3 /dev/sdc3 /dev/sdd3 cat /proc/mdstat md2 : active raid10 sdd3[2] sdc3[1] sdb3[0] 5855424

Re: Raid over 48 disks

2007-12-18 Thread Neil Brown
On Tuesday December 18, [EMAIL PROTECTED] wrote: We're investigating the possibility of running Linux (RHEL) on top of Sun's X4500 Thumper box: http://www.sun.com/servers/x64/x4500/ Basically, it's a server with 48 SATA hard drives. No hardware RAID. It's designed for Sun's ZFS

Re: Cannot re-assemble Degraded RAID6 after crash

2007-12-17 Thread Neil Brown
On Monday December 17, [EMAIL PROTECTED] wrote: My system has crashed a couple of times, each time the two drives have dropped off of the RAID. Previously I simply did the following, which would take all night: mdadm -a --re-add /dev/md2 /dev/sde3 mdadm -a --re-add /dev/md2 /dev/sdf3

Re: Please Help!!! Raid 5 reshape failed!

2007-12-16 Thread Neil Brown
On Friday December 14, [EMAIL PROTECTED] wrote: gentoofs ~#mdadm --assemble /dev/md1 /dev/sdc /dev/sdd /dev/sdf mdadm: /dev/md1 assembled from 2 drives - not enough to start the array Try adding --run. or maybe --force. NeilBrown - To unsubscribe from this list: send the line unsubscribe

Re: [PATCH 007 of 7] md: Get name for block device in sysfs

2007-12-16 Thread Neil Brown
On Saturday December 15, [EMAIL PROTECTED] wrote: On Dec 14, 2007 7:26 AM, NeilBrown [EMAIL PROTECTED] wrote: Given an fd on a block device, returns a string like /block/sda/sda1 which can be used to find related information in /sys. As pointed out to when you came up

Re: mdadm break / restore soft mirror

2007-12-14 Thread Neil Brown
On Thursday December 13, [EMAIL PROTECTED] wrote: What you could do is set the number of devices in the array to 3 so they it always appears to be degraded, then rotate your backup drives through the array. The number of dirty bits in the bitmap will steadily grow and so resyncs will take

Re: Auto assembly errors with mdadm and 64K aligned partitions.

2007-12-13 Thread Neil Brown
On Thursday December 13, [EMAIL PROTECTED] wrote: Good morning to Neil and everyone on the list, hope your respective days are going well. Quick overview. We've isolated what appears to be a failure mode with mdadm assembling RAID1 (and presumably other level) volumes which kernel based

RE: mdadm break / restore soft mirror

2007-12-13 Thread Neil Brown
On Thursday December 13, [EMAIL PROTECTED] wrote: How do I create the internal bitmap? man mdadm didn't shed any light and my brief excursion into google wasn't much more helpful. mdadm --grow --bitmap=internal /dev/mdX The version I have installed is mdadm-1.12.0-5.i386 from RedHat

Re: mdadm break / restore soft mirror

2007-12-12 Thread Neil Brown
On Wednesday December 12, [EMAIL PROTECTED] wrote: Hi, Question for you guys. A brief history: RHEL 4 AS I have a partition with way to many small files on (Usually around a couple of million) that needs to be backed up, standard methods mean that a restore is

Re: [PATCH] (2nd try) force parallel resync

2007-12-06 Thread Neil Brown
On Thursday December 6, [EMAIL PROTECTED] wrote: Hello, here is the second version of the patch. With this version also on setting /sys/block/*/md/sync_force_parallel the sync_thread is woken up. Though, I still don't understand why md_wakeup_thread() is not working. Could give a little

Re: RAID mapper device size wrong after replacing drives

2007-12-06 Thread Neil Brown
I think you would have more luck posting this to [EMAIL PROTECTED] - I think that is where support for device mapper happens. NeilBrown On Thursday December 6, [EMAIL PROTECTED] wrote: Hi, I have a problem with my RAID array under Linux after upgrading to larger drives. I have a machine

Re: Spontaneous rebuild

2007-12-02 Thread Neil Brown
On Sunday December 2, [EMAIL PROTECTED] wrote: Anyway, the problems are back: To test my theory that everything is alright with the CPU running within its specs, I removed one of the drives while copying some large files yesterday. Initially, everything seemed to work out nicely, and by the

Re: Reading takes 100% precedence over writes for mdadm+raid5?

2007-12-02 Thread Neil Brown
On Sunday December 2, [EMAIL PROTECTED] wrote: Was curious if when running 10 DD's (which are writing to the RAID 5) fine, no issues, suddenly all go into D-state and let the read/give it 100% priority? So are you saying that the writes completely stalled while the read was progressing?

Re: assemble vs create an array.......

2007-11-29 Thread Neil Brown
On Thursday November 29, [EMAIL PROTECTED] wrote: Hello, I had created a raid 5 array on 3 232GB SATA drives. I had created one partition (for /home) formatted with either xfs or reiserfs (I do not recall). Last week I reinstalled my box from scratch with Ubuntu 7.10, with mdadm v.

Re: raid5 reshape/resync

2007-11-28 Thread Neil Brown
On Sunday November 25, [EMAIL PROTECTED] wrote: - Message from [EMAIL PROTECTED] - Date: Sat, 24 Nov 2007 12:02:09 +0100 From: Nagilum [EMAIL PROTECTED] Reply-To: Nagilum [EMAIL PROTECTED] Subject: raid5 reshape/resync To: linux-raid@vger.kernel.org Hi,

Re: raid6 check/repair

2007-11-28 Thread Neil Brown
On Thursday November 22, [EMAIL PROTECTED] wrote: Dear Neil, thank you very much for your detailed answer. Neil Brown wrote: While it is possible to use the RAID6 P+Q information to deduce which data block is wrong if it is known that either 0 or 1 datablocks is wrong

Re: raid6 check/repair

2007-11-28 Thread Neil Brown
On Tuesday November 27, [EMAIL PROTECTED] wrote: Thiemo Nagel wrote: Dear Neil, thank you very much for your detailed answer. Neil Brown wrote: While it is possible to use the RAID6 P+Q information to deduce which data block is wrong if it is known that either 0 or 1 datablocks

Re: [PATCH] Skip bio copy in full-stripe write ops

2007-11-23 Thread Neil Brown
On Friday November 23, [EMAIL PROTECTED] wrote: Hello all, Here is a patch which allows to skip intermediate data copying between the bio requested to write and the disk cache in sh if the full-stripe write operation is on the way. This improves the performance of write

Re: md RAID 10 on Linux 2.6.20?

2007-11-22 Thread Neil Brown
On Thursday November 22, [EMAIL PROTECTED] wrote: Hi all, I am running a home-grown Linux 2.6.20.11 SMP 64-bit build, and I am wondering if there is indeed a RAID 10 personality defined in md that can be implemented using mdadm. If so, is it available in 2.6.20.11, or is it in a later

Re: raid6 check/repair

2007-11-21 Thread Neil Brown
On Wednesday November 21, [EMAIL PROTECTED] wrote: Dear Neal, I have been looking a bit at the check/repair functionality in the raid6 personality. It seems that if an inconsistent stripe is found during repair, md does not try to determine which block is corrupt (using e.g. the

Re: BUG: soft lockup detected on CPU#1! (was Re: raid6 resync blocks the entire system)

2007-11-21 Thread Neil Brown
On Tuesday November 20, [EMAIL PROTECTED] wrote: My personal (wild) guess for this problem is, that there is somewhere a global lock, preventing all other CPUs to do something. At 100%s (at 80 MB/s) there's probably not left any time frame to wake up the other CPUs or its sufficiently

Re: raid6 check/repair

2007-11-15 Thread Neil Brown
On Thursday November 15, [EMAIL PROTECTED] wrote: Hi, I have been looking a bit at the check/repair functionality in the raid6 personality. It seems that if an inconsistent stripe is found during repair, md does not try to determine which block is corrupt (using e.g. the method in

Re: Chnaging partition types of RAID array members

2007-11-15 Thread Neil Brown
On Thursday November 15, [EMAIL PROTECTED] wrote: Hi. I have two RAID5 arrays on an opensuse 10.3 system. They are used together in a large LVM volume that contains a lot of data I'd rather not have to try and backup/recreate. md1 comes up fine and is detected by the OS on boot and

Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23

2007-11-14 Thread Neil Brown
On Tuesday November 13, [EMAIL PROTECTED] wrote: raid5-fix-unending-write-sequence.patch is in -mm and I believe is waiting on an Acked-by from Neil? It seems to have just been sent on to Linus, so it probably will go in without: Acked-By: NeilBrown [EMAIL PROTECTED] I'm beginning to

Re: Proposal: non-striping RAID4

2007-11-14 Thread Neil Brown
On Thursday November 15, [EMAIL PROTECTED] wrote: Neil: any comments on whether this would be desirable / useful / feasible? 1/ Have in raid4 variant which arranges the data like 'linear' is something I am planning to do eventually. If your filesystem nows about the geometry of the

Re: Building a new raid6 with bitmap does not clear bits during resync

2007-11-12 Thread Neil Brown
On Monday November 12, [EMAIL PROTECTED] wrote: Neil Brown wrote: However there is value in regularly updating the bitmap, so add code to periodically pause while all pending sync requests complete, then update the bitmap. Doing this only every few seconds (the same as the bitmap

Re: Building a new raid6 with bitmap does not clear bits during resync

2007-11-11 Thread Neil Brown
-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c | 34 +- ./drivers/md/raid1.c |1 + ./drivers/md/raid10.c |2 ++ ./drivers/md/raid5.c |3 +++ ./include/linux/raid/bitmap.h |3 +++ 5 files

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread Neil Brown
correctly, presumably due to substantial similarities between handle_stripe5 and handle_stripe6. This patch (with lots of context) moves the chunk of new code from handle_stripe6 (where it isn't needed (yet)) to handle_stripe5. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers

Re: Very small internal bitmap after recreate

2007-11-02 Thread Neil Brown
On Friday November 2, [EMAIL PROTECTED] wrote: Am 02.11.2007 um 10:22 schrieb Neil Brown: On Friday November 2, [EMAIL PROTECTED] wrote: I have a 5 disk version 1.0 superblock RAID5 which had an internal bitmap that has been reported to have a size of 299 pages in /proc/ mdstat

Re: stride / stripe alignment on LVM ?

2007-11-01 Thread Neil Brown
On Thursday November 1, [EMAIL PROTECTED] wrote: Hello, I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have created LVM volume called 'raid5', and finally a logical volume 'backup'. Then I formatted it with command: mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480

Re: Superblocks

2007-11-01 Thread Neil Brown
On Tuesday October 30, [EMAIL PROTECTED] wrote: Which is the default type of superblock? 0.90 or 1.0? The default default is 0.90. However a local device can be set in mdadm.conf with e.g. CREATE metdata=1.0 NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in

Re: Bad drive discovered during raid5 reshape

2007-10-30 Thread Neil Brown
On Tuesday October 30, [EMAIL PROTECTED] wrote: Neil Brown wrote: On Monday October 29, [EMAIL PROTECTED] wrote: Hi, I bought two new hard drives to expand my raid array today and unfortunately one of them appears to be bad. The problem didn't arise Looks like you are in real trouble

Re: Time to deprecate old RAID formats?

2007-10-29 Thread Neil Brown
On Friday October 26, [EMAIL PROTECTED] wrote: Perhaps you could have called them 1.start, 1.end, and 1.4k in the beginning? Isn't hindsight wonderful? Those names seem good to me. I wonder if it is safe to generate them in -Eb output Maybe the key confusion here is between version

Re: Superblocks

2007-10-29 Thread Neil Brown
On Friday October 26, [EMAIL PROTECTED] wrote: Can someone help me understand superblocks and MD a little bit? I've got a raid5 array with 3 disks - sdb1, sdc1, sdd1. --examine on these 3 drives shows correct information. However, if I also examine the raw disk devices, sdb and sdd,

Re: Implementing low level timeouts within MD

2007-10-29 Thread Neil Brown
On Friday October 26, [EMAIL PROTECTED] wrote: I've been asking on my other posts but haven't seen a direct reply to this question: Can MD implement timeouts so that it detects problems when drivers don't come back? No. However it is possible that we will start sending the BIO_RW_FAILFAST

  1   2   3   4   5   6   7   8   9   10   >