Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Justin Piszcz wrote: On Mon, 22 Jan 2007, Andrew Morton wrote: On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] wrote: Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke the OOM killer and kill all of my processes? Running with PREEMPT OFF lets me copy the file!! The machine LAGS occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of lag, but hey, it does not crash!! I will boot the older kernel with preempt on and see if I can get you that information you requested. Justin, According to your kernel_ring_buffer.txt (attached to another email), you are using anticipatory as your io scheduler: 289 Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop registered 290 Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler anticipatory registered (default) I had a problem with this scheduler where my system would occasionally lockup during heavy I/O. Sometimes it would fix itself, sometimes I had to reboot. I changed to the CFQ io scheduler and my system has worked fine since then. CFQ has to be built into the kernel (under BlockLayer/IOSchedulers). It can be selected as default or you can set it during runtime: echo cfq /sys/block/disk/queue/scheduler ... Hope this helps, Bill - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 reshape bug with XFS
Hi, I'm setting up a raid 5 system and I ran across a bug when reshaping an array with a mounted XFS filesystem on it. This is under linux 2.6.18.2 and mdadm 2.5.5 I have a test array with 3 10 GB disks and a fourth 10 GB spare disk, and a mounted xfs filesystem on it: [EMAIL PROTECTED] $ mdadm --detail /dev/md4 /dev/md4: Version : 00.90.03 Creation Time : Sat Nov 4 18:58:59 2006 Raid Level : raid5 Array Size : 20964480 (19.99 GiB 21.47 GB) Device Size : 10482240 (10.00 GiB 10.73 GB) Raid Devices : 3 Total Devices : 4 Preferred Minor : 4 Persistence : Superblock is persistent [snip] ...I Grow it: [EMAIL PROTECTED] $ mdadm -G /dev/md4 -n4 mdadm: Need to backup 384K of critical section.. mdadm: ... critical section passed. [EMAIL PROTECTED] $ mdadm --detail /dev/md4 /dev/md4: Version : 00.91.03 Creation Time : Sat Nov 4 18:58:59 2006 Raid Level : raid5 Array Size : 20964480 (19.99 GiB 21.47 GB) Device Size : 10482240 (10.00 GiB 10.73 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 4 Persistence : Superblock is persistent --- It goes along and reshapes fine (from /proc/mdstat): md4 : active raid5 dm-67[3] dm-66[2] dm-65[1] dm-64[0] 20964480 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/4] [] [] reshape = 22.0% (2314624/10482240) finish=16.7min speed=8128K/sec When the reshape completes, the full array size gets corrupted: /proc/mdstat: md4 : active raid5 dm-67[3] dm-66[2] dm-65[1] dm-64[0] 31446720 blocks level 5, 64k chunk, algorithm 2 [4/4] [] - looks good, but- [EMAIL PROTECTED] $ mdadm --detail /dev/md4 /dev/md4: Version : 00.90.03 Creation Time : Sat Nov 4 18:58:59 2006 Raid Level : raid5 Array Size : 2086592 (2038.03 MiB 2136.67 MB) Device Size : 10482240 (10.00 GiB 10.73 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 4 Persistence : Superblock is persistent (2086592 != 31446720 -- Bad, much too small) - xfs_growfs /dev/md4 barfs horribly - something about reading past the end of the device. If I unmount the XFS filesystem, things work ok: [EMAIL PROTECTED] $ umount /dev/md4 [EMAIL PROTECTED] $ mdadm --detail /dev/md4 /dev/md4: Version : 00.90.03 Creation Time : Sat Nov 4 18:58:59 2006 Raid Level : raid5 Array Size : 31446720 (29.99 GiB 32.20 GB) Device Size : 10482240 (10.00 GiB 10.73 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 4 Persistence : Superblock is persistent (31446720 == 31446720 -- Good) If I remount the fs, I can use xfs_growfs with no ill effects. It's a pretty easy work-around to not have the fs mounted during the resize, but it doesn't seem right for the array size to get borked like this. If there's anything I can provide to debug this let me know. Thanks, Bill - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
Niccolo Rigacci wrote: Hi to all, I have a new IBM xSeries 206m with two SATA drives, I installed a Debian Testing (Etch) and configured a software RAID as shown: Personalities : [raid1] md1 : active raid1 sdb5[1] sda5[0] 1951744 blocks [2/2] [UU] md2 : active raid1 sdb6[1] sda6[0] 2931712 blocks [2/2] [UU] md3 : active raid1 sdb7[1] sda7[0] 39061952 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 582 blocks [2/2] [UU] I experience this problem: whenever a volume is reconstructing (syncing), the system stops responding. The machine is alive, because it responds to the ping, the console is responsive but I cannot pass the login prompt. It seems that every disk activity is delayed and blocking. When the sync is complete, the machine start to respond again perfectly. Any hints on how to start debugging? I ran into a similar problem using kernel 2.6.16.14 on an ASUS motherboard: When I mirrored two SATA drives it seemed to block all other disk I/O until the sync was complete. My symptoms were the same: all consoles were non-responsive and when I tried to login it just sat there until the sync was complete. I was able to work around this by lowering /proc/sys/dev/raid/speed_limit_max to a value below my disk thruput value (~ 50 MB/s) as follows: $ echo 45000 /proc/sys/dev/raid/speed_limit_max That kept my system usable but didn't address the underlying problem of the raid resync not being appropriately throttled. I ended up configuring my system differently so this became a moot point for me. Hope this helps, Bill - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID1 Array corruption when adding an extra device with mdadm
I've got a system running 2.6.14.6 with a raid1 array of 2 disks. The size of the array is as follows (from mdadm --detail): Raid Level : raid1 Array Size : 28314496 (27.00 GiB 28.99 GB) Device Size : 28314496 (27.00 GiB 28.99 GB) Raid Devices : 2 Total Devices : 2 I'm trying to add an extra disk to make a three-way mirror using mdadm: mdadm --grow /dev/md0 -n 3 When I do this, the disk gets added (so there are 3 raid devices) --BUT-- also, the Array Size changes to 3.0 GB. If I immediately reboot, things end up ok, but if I let it run it destroys the array contents. This happened under mdadm v2.1 and 2.2. I hacked mdadm to print out what it's doing, and things look ok in Manage_resize() until the mdu_array_info_t structure is updated using ioctl (SET_ARRAY_INFO), then the above mentioned size change happens. Does anyone know what's up with this? Thanks, -Bill - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html