Re: PROBLEM: raid5 hangs

2007-11-14 Thread Justin Piszcz



On Wed, 14 Nov 2007, Peter Magnusson wrote:


On Wed, 14 Nov 2007, Justin Piszcz wrote:

This is a known bug in 2.6.23 and should be fixed in 2.6.23.2 if the RAID5 
bio* patches are applied.


Ok, good to know.
Do you know when it first appeared because it existed in linux-2.6.22.3 
also...?




I am unsure, I and others started noticing it in 2.6.23 mainly; again, not 
sure, will let others answer this one.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: raid5 hangs

2007-11-14 Thread Bill Davidsen

Justin Piszcz wrote:
This is a known bug in 2.6.23 and should be fixed in 2.6.23.2 if the 
RAID5 bio* patches are applied.


Note below he's running 2.6.22.3 which doesn't have the bug unless 
-STABLE added it. So should not really be in 2.6.22.anything. I assume 
you're talking the endless write or bio issue?


Justin.

On Wed, 14 Nov 2007, Peter Magnusson wrote:


Hey.

[1.] One line summary of the problem:

raid5 hangs and use 100% cpu

[2.] Full description of the problem/report:

I have used 2.6.18 for 284 days or something until my powersupply 
died, no problem what so ever duing that time. After that forced 
reboot I did these changes; Put in 2 GB more memory so I have 3 GB 
instead of 1 GB, two disks in the raid5 got badblocks so I didnt 
trust them anymore so I bought new disks (I managed to save the 
raid5). I have 6x300 GB in a raid5. Two of them are now 320 GB so 
created a small raid1 also. That raid5 is encrypted with 
aes-cbc-plain. The raid1 is encrypted with aes-cbc-essiv:sha256.


I compiled linux-2.6.22.3 and started to use that. I used the same 
.config
as in default FC5, I think i just selected P4 cpu and preemptive 
kernel type.


After 11 or 12 days the computer froze, I wasnt home when it happend and
couldnt fix it for like 3 days. It was just to reboot it as it wasnt 
possible to login remotely or on console. It did respond to ping 
however.

After reboot it rebuilded the raid5.

Then it happend again after approx the same time, 11 or 12 days. I 
noticed

that the process md1_raid5 used 100% cpu all the time. After reboot it
rebuilded the raid5.

I compiled linux-2.6.23.

And then... it happend again... After about the same time as before.
md1_raid5 used 100% cpu. I also noticed that I wasnt able to save
anything in my homedir, it froze during save. I could read from it 
however. My homedir isnt on raid5 but its encrypted. Its not on any 
disk that has to do with raid. This problem didnt happend when I used 
2.6.18. Currently I use 2.6.18 as I kinda need the computer stable.

After reboot it rebuilded the raid5.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: raid5 hangs

2007-11-14 Thread Justin Piszcz



On Wed, 14 Nov 2007, Bill Davidsen wrote:


Justin Piszcz wrote:
This is a known bug in 2.6.23 and should be fixed in 2.6.23.2 if the RAID5 
bio* patches are applied.


Note below he's running 2.6.22.3 which doesn't have the bug unless -STABLE 
added it. So should not really be in 2.6.22.anything. I assume you're talking 
the endless write or bio issue?

The bio issue is the root cause of the bug yes?
--

I am uncertain but I remember this happening in the past but I thought it 
was something I was doing (possibly  2.6.23) so it may have been 
happenign earlier than that but I am not positive.




Justin.

On Wed, 14 Nov 2007, Peter Magnusson wrote:


Hey.

[1.] One line summary of the problem:

raid5 hangs and use 100% cpu

[2.] Full description of the problem/report:

I have used 2.6.18 for 284 days or something until my powersupply died, no 
problem what so ever duing that time. After that forced reboot I did these 
changes; Put in 2 GB more memory so I have 3 GB instead of 1 GB, two disks 
in the raid5 got badblocks so I didnt trust them anymore so I bought new 
disks (I managed to save the raid5). I have 6x300 GB in a raid5. Two of 
them are now 320 GB so created a small raid1 also. That raid5 is encrypted 
with aes-cbc-plain. The raid1 is encrypted with aes-cbc-essiv:sha256.


I compiled linux-2.6.22.3 and started to use that. I used the same .config
as in default FC5, I think i just selected P4 cpu and preemptive kernel 
type.


After 11 or 12 days the computer froze, I wasnt home when it happend and
couldnt fix it for like 3 days. It was just to reboot it as it wasnt 
possible to login remotely or on console. It did respond to ping however.

After reboot it rebuilded the raid5.

Then it happend again after approx the same time, 11 or 12 days. I noticed
that the process md1_raid5 used 100% cpu all the time. After reboot it
rebuilded the raid5.

I compiled linux-2.6.23.

And then... it happend again... After about the same time as before.
md1_raid5 used 100% cpu. I also noticed that I wasnt able to save
anything in my homedir, it froze during save. I could read from it 
however. My homedir isnt on raid5 but its encrypted. Its not on any disk 
that has to do with raid. This problem didnt happend when I used 2.6.18. 
Currently I use 2.6.18 as I kinda need the computer stable.

After reboot it rebuilded the raid5.


--
bill davidsen [EMAIL PROTECTED]
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: raid5 hangs

2007-11-14 Thread Dan Williams
On Nov 14, 2007 5:05 PM, Justin Piszcz [EMAIL PROTECTED] wrote:
 On Wed, 14 Nov 2007, Bill Davidsen wrote:
  Justin Piszcz wrote:
  This is a known bug in 2.6.23 and should be fixed in 2.6.23.2 if the RAID5
  bio* patches are applied.
 
  Note below he's running 2.6.22.3 which doesn't have the bug unless -STABLE
  added it. So should not really be in 2.6.22.anything. I assume you're 
  talking
  the endless write or bio issue?
 The bio issue is the root cause of the bug yes?

Not if this is a 2.6.22 issue.  Neither of the bugs fixed by raid5:
fix clearing of biofill operations or raid5: fix unending write
sequence existed prior to 2.6.23.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PROBLEM: raid5 hangs

2007-11-13 Thread Peter Magnusson

Hey.

[1.] One line summary of the problem:

raid5 hangs and use 100% cpu

[2.] Full description of the problem/report:

I have used 2.6.18 for 284 days or something until my powersupply died, no 
problem what so ever duing that time. After that forced reboot I did these 
changes; Put in 2 GB more memory so I have 3 GB instead of 1 GB, two disks 
in the raid5 got badblocks so I didnt trust them anymore so I bought new 
disks (I managed to save the raid5). I have 6x300 GB in a raid5. Two of 
them are now 320 GB so created a small raid1 also. That raid5 is encrypted 
with aes-cbc-plain. The raid1 is encrypted with aes-cbc-essiv:sha256.


I compiled linux-2.6.22.3 and started to use that. I used the same .config
as in default FC5, I think i just selected P4 cpu and preemptive kernel 
type.


After 11 or 12 days the computer froze, I wasnt home when it happend and
couldnt fix it for like 3 days. It was just to reboot it as it wasnt 
possible to login remotely or on console. It did respond to ping however.

After reboot it rebuilded the raid5.

Then it happend again after approx the same time, 11 or 12 days. I noticed
that the process md1_raid5 used 100% cpu all the time. After reboot it
rebuilded the raid5.

I compiled linux-2.6.23.

And then... it happend again... After about the same time as before.
md1_raid5 used 100% cpu. I also noticed that I wasnt able to save
anything in my homedir, it froze during save. I could read from it 
however. My homedir isnt on raid5 but its encrypted. Its not on any disk 
that has to do with raid. This problem didnt happend when I used 2.6.18. 
Currently I use 2.6.18 as I kinda need the computer stable.

After reboot it rebuilded the raid5.

top looked like this:

- 02:37:32 up 11 days,  2:00, 29 users,  load average: 21.06, 17.45, 9.38
Tasks: 284 total,   2 running, 282 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.1%us, 51.2%sy,  0.0%ni,  0.0%id, 46.6%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   3114928k total,  2981720k used,   133208k free, 8244k buffers
Swap:  2096472k total,  252k used,  2096220k free,  1690196k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 2147 root  15  -5 000 R  100  0.0  80:25.80 md1_raid5
11328 iocc  20   0  536m 374m  28m S3 12.3 249:32.38 firefox-bin

After some time, just before I rebooted I had this load:

 02:48:36 up 11 days,  2:11, 29 users,  load average: 86.10, 70.80, 40.07

[3.] Keywords (i.e., modules, networking, kernel):

raid5, possible dm_mod

[4.] Kernel version (from /proc/version):

Not using 2.6.23 now but anyway...
Linux version 2.6.18 ([EMAIL PROTECTED]) (gcc version 4.1.1 
20060525 (Red Hat 4.1.1-1)) #1 SMP Sun Sep 24 12:58:16 CEST 2006


[5.] Output of Oops.. message (if applicable) with symbolic information
 resolved (see Documentation/oops-tracing.txt)

No oopses, doesnt log anything.

[6.] A small shell script or example program which triggers the
 problem (if possible)

-

[7.] Environment

Hmm..

FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 7.8G  7.0G  761M  91% /- unencrypted fs
tmpfs 1.5G 0  1.5G   0% /dev/shm
/dev/mapper/home   24G   23G  1.6G  94% /home- encrypted fs
/dev/mapper/temp  1.4T  822G  555G  60% /temp- encrypted fs,raid5
/dev/mapper/jb 18G   17G  1.2G  94% /mnt/jb  - encrypted fs,raid1

[EMAIL PROTECTED] linux-2.6.23]# cryptsetup status home
/dev/mapper/home is active:
  cipher:  aes-cbc-plain
  keysize: 256 bits
  device:  /dev/sda3
  offset:  0 sectors
  size:50861790 sectors
  mode:read/write
[EMAIL PROTECTED] linux-2.6.23]# cryptsetup status temp
/dev/mapper/temp is active:
  cipher:  aes-cbc-plain
  keysize: 256 bits
  device:  /dev/md1
  offset:  0 sectors
  size:2930496000 sectors
  mode:read/write
[EMAIL PROTECTED] linux-2.6.23]# cryptsetup status jb
/dev/mapper/jb is active:
  cipher:  aes-cbc-essiv:sha256
  keysize: 256 bits
  device:  /dev/md0
  offset:  0 sectors
  size:37238528 sectors
  mode:read/write

[7.1.] Software (add the output of the ver_linux script here)

If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux flashdance.cx 2.6.18 #1 SMP Sun Sep 24 12:58:16 CEST 2006 i686 i686 
i386 GNU/Linux


Gnu C  4.1.1
Gnu make   3.80
binutils   2.16.91.0.6
util-linux 2.13-pre7
mount  2.13-pre7
module-init-tools  3.2.2
e2fsprogs  1.38
reiserfsprogs  3.6.19
quota-tools3.13.
PPP2.4.3
Linux C Library2.4
Dynamic linker (ldd)   2.4
Procps 3.2.7
Net-tools  1.60
Kbd1.12
oprofile   0.9.1
Sh-utils   5.97
udev   084
wireless-tools 28
Modules Loaded vfat fat usb_storage cdc_ether usbnet cdc_acm nfs 
sha256 aes