beginner error detection

2007-02-23 Thread Tomka Gergely
Hi! I have a simple sw raid1, over two sata disks. One of the disks started to complain (s.m.a.r.t. errors). I think in the near future i witness a disk failure. But i don't know how this thing is happening with raid1, so i have some questions. If these questions answered somewhere (faq,

Re: Reshaping raid0/10

2007-02-23 Thread Jan Engelhardt
On Feb 22 2007 06:59, Neil Brown wrote: On Wednesday February 21, [EMAIL PROTECTED] wrote: are there any plans to support reshaping on raid0 and raid10? No concrete plans. It largely depends on time and motivation. I expect that the various flavours of raid5/raid6 reshape will come first.

2.6.20: stripe_cache_size goes boom with 32mb

2007-02-23 Thread Justin Piszcz
Each of these are averaged over three runs with 6 SATA disks in a SW RAID 5 configuration: (dd if=/dev/zero of=file_1 bs=1M count=2000) 128k_stripe: 69.2MB/s 256k_stripe: 105.3MB/s 512k_stripe: 142.0MB/s 1024k_stripe: 144.6MB/s 2048k_stripe: 208.3MB/s 4096k_stripe: 223.6MB/s 8192k_stripe:

Re: [PATCH 006 of 6] md: Add support for reshape of a raid6

2007-02-23 Thread Helge Hafting
Andrew Morton wrote: On Thu, 22 Feb 2007 13:39:56 +1100 Neil Brown [EMAIL PROTECTED] wrote: I must right code that Andrew can read. That's write. But more importantly, things that people can immediately see and understand help reduce the possibility of mistakes. Now and in the

Re: 2.6.20: stripe_cache_size goes boom with 32mb

2007-02-23 Thread Justin Piszcz
I have 2GB On this machine. For me, 8192 seems to be the sweet spot, I will probably keep it at 8mb. On Fri, 23 Feb 2007, Jason Rainforest wrote: Hi Justin, I'm not a RAID or kernel developer, but .. do you have enough RAM to support a 32mb stripe_cache_size?! Here on my 7*250Gb SW RAID5

Re: 2.6.20: stripe_cache_size goes boom with 32mb

2007-02-23 Thread Jason Rainforest
Hi Justin, I'm not a RAID or kernel developer, but .. do you have enough RAM to support a 32mb stripe_cache_size?! Here on my 7*250Gb SW RAID5 array, decreasing a stripe_cache_size of 8192 to 4096 frees up no less than 120mb of RAM. Using that as a calculation tool, a 32mb stripe_cache_size would

Re: 2.6.20: stripe_cache_size goes boom with 32mb

2007-02-23 Thread Jan Engelhardt
On Feb 23 2007 06:41, Justin Piszcz wrote: I was able to Alt-SysRQ+b but I could not access the console/X/etc, it appeared to be frozen. No sysrq+t? (Ah, unblanking might hang.) Well, netconsole/serial to the rescue, then ;-) Jan -- - To unsubscribe from this list: send the line

Re: PATA/SATA Disk Reliability paper

2007-02-23 Thread Al Boldi
Stephen C Woods wrote: So drives do need to be ventilated, not so much wory about exploding, but rather subtle distortion of the case as the atmospheric preasure changed. I have a '94 Caviar without any apparent holes; and as a bonus, the drive still works. In contrast, ever since these

Linux Software RAID a bit of a weakness?

2007-02-23 Thread Colin Simpson
Hi, We had a small server here that was configured with a RAID 1 mirror, using two IDE disks. Last week one of the drives failed in this. So we replaced the drive and set the array to rebuild. The good disk then found a bad block and the mirror failed. Now I presume that the good disk must

Re: Linux Software RAID a bit of a weakness?

2007-02-23 Thread Steve Cousins
Colin Simpson wrote: Hi, We had a small server here that was configured with a RAID 1 mirror, using two IDE disks. Last week one of the drives failed in this. So we replaced the drive and set the array to rebuild. The good disk then found a bad block and the mirror failed. Now I presume

Re: Reshaping raid0/10

2007-02-23 Thread Neil Brown
On Friday February 23, [EMAIL PROTECTED] wrote: On Feb 22 2007 06:59, Neil Brown wrote: On Wednesday February 21, [EMAIL PROTECTED] wrote: are there any plans to support reshaping on raid0 and raid10? No concrete plans. It largely depends on time and motivation. I expect that

Re: Linux Software RAID a bit of a weakness?

2007-02-23 Thread Neil Brown
On Friday February 23, [EMAIL PROTECTED] wrote: Hi, We had a small server here that was configured with a RAID 1 mirror, using two IDE disks. Last week one of the drives failed in this. So we replaced the drive and set the array to rebuild. The good disk then found a bad block and the

Re: 2.6.20: stripe_cache_size goes boom with 32mb

2007-02-23 Thread Dan Williams
On 2/23/07, Justin Piszcz [EMAIL PROTECTED] wrote: I have 2GB On this machine. For me, 8192 seems to be the sweet spot, I will probably keep it at 8mb. Just a note stripe_cache_size = 8192 = 192MB with six disks. The calculation is: stripe_cache_size * num_disks * PAGE_SIZE =

nonzero mismatch_cnt with no earlier error

2007-02-23 Thread Eyal Lebedinsky
I run a 'check' weekly, and yesterday it came up with a non-zero mismatch count (184). There were no earlier RAID errors logged and the count was zero after the run a week ago. Now, the interesting part is that there was one i/o error logged during the check *last week*, however the raid did not

Re: Linux Software RAID a bit of a weakness?

2007-02-23 Thread Richard Scobie
Neil Brown wrote: The 'check' process reads all copies and compares them with one another, If there is a difference it is reported. If you use 'repair' instead of 'check', the difference is arbitrarily corrected. If a read error is detected during the 'check', md/raid1 will attempt to write

end to end error recovery musings

2007-02-23 Thread Ric Wheeler
In the IO/FS workshop, one idea we kicked around is the need to provide better and more specific error messages between the IO stack and the file system layer. My group has been working to stabilize a relatively up to date libata + MD based box, so I can try to lay out at least one appliance

Re: end to end error recovery musings

2007-02-23 Thread H. Peter Anvin
Ric Wheeler wrote: We still have the following challenges: (1) read-ahead often means that we will retry every bad sector at least twice from the file system level. The first time, the fs read ahead request triggers a speculative read that includes the bad sector (triggering the error

Re: end to end error recovery musings

2007-02-23 Thread Andreas Dilger
On Feb 23, 2007 16:03 -0800, H. Peter Anvin wrote: Ric Wheeler wrote: (1) read-ahead often means that we will retry every bad sector at least twice from the file system level. The first time, the fs read ahead request triggers a speculative read that includes the bad sector

Re: end to end error recovery musings

2007-02-23 Thread H. Peter Anvin
Andreas Dilger wrote: And clearing this list when the sector is overwritten, as it will almost certainly be relocated at the disk level. Certainly if the overwrite is successful. -hpa - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to

Re: nonzero mismatch_cnt with no earlier error

2007-02-23 Thread Eyal Lebedinsky
I did a resync since, which ended up with the same mismatch_cnt of 184. I noticed that the count *was* reset to zero when the resync started, but ended up with 184 (same as after the check). I thought that the resync just calculates fresh parity and does not bother checking if it is different. So