[PATCH md 001 of 5] drivers/md/raid1.c: make a function static

2005-09-02 Thread NeilBrown

This patch makes a needlessly global function static.

Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid1.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~   2005-09-02 15:59:28.0 +1000
+++ ./drivers/md/raid1.c2005-09-02 15:59:34.0 +1000
@@ -1703,7 +1703,7 @@ static int raid1_reshape(mddev_t *mddev,
return 0;
 }
 
-void raid1_quiesce(mddev_t *mddev, int state)
+static void raid1_quiesce(mddev_t *mddev, int state)
 {
conf_t *conf = mddev_to_conf(mddev);
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 005 of 5] Report spare drives in /proc/mdstat

2005-09-02 Thread NeilBrown

Just like failed drives have (F), so spare drives now have (S).


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2005-09-02 15:59:50.0 +1000
+++ ./drivers/md/md.c   2005-09-02 16:00:04.0 +1000
@@ -3334,7 +3334,8 @@ static int md_seq_show(struct seq_file *
if (rdev-faulty) {
seq_printf(seq, (F));
continue;
-   }
+   } else if (rdev-raid_disk  0)
+   seq_printf(seq, (S)); /* spare */
size += rdev-size;
}
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 004 of 5] Add information about superblock version to /proc/mdstat

2005-09-02 Thread NeilBrown

Leave it unchanged if the original (0.90) is used, incase it might
be a compatability problem.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |9 +
 1 file changed, 9 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2005-09-02 15:59:46.0 +1000
+++ ./drivers/md/md.c   2005-09-02 15:59:50.0 +1000
@@ -3346,6 +3346,15 @@ static int md_seq_show(struct seq_file *
seq_printf(seq, \n  %llu blocks,
(unsigned long long)size);
}
+   if (mddev-persistent) {
+   if (mddev-major_version != 0 ||
+   mddev-minor_version != 90) {
+   seq_printf(seq, super %d.%d,
+  mddev-major_version,
+  mddev-minor_version);
+   }
+   } else
+   seq_printf(seq,  super non-persistent);
 
if (mddev-pers) {
mddev-pers-status (seq, mddev);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 002 of 5] Choose better default offset for bitmap.

2005-09-02 Thread NeilBrown

On reflection, a better default location for hot-adding
bitmaps with version-1 superblocks is immediately after
the superblock.  There might not be much room there, but
there is usually atleast 3k, and that is a good start.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2005-09-02 15:59:28.0 +1000
+++ ./drivers/md/md.c   2005-09-02 15:59:39.0 +1000
@@ -957,8 +957,7 @@ static int super_1_validate(mddev_t *mdd
mddev-events = le64_to_cpu(sb-events);
mddev-bitmap_offset = 0;
mddev-default_bitmap_offset = 0;
-   if (mddev-minor_version == 0)
-   mddev-default_bitmap_offset = -(64*1024)/512;
+   mddev-default_bitmap_offset = 1024;

mddev-recovery_cp = le64_to_cpu(sb-resync_offset);
memcpy(mddev-uuid, sb-set_uuid, 16);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 003 of 5] Use queue_hardsect_size instead of block_size for md superblock size calc.

2005-09-02 Thread NeilBrown

Doh.  I want the physical hard-sector-size, not the current block size...

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2005-09-02 15:59:39.0 +1000
+++ ./drivers/md/md.c   2005-09-02 15:59:46.0 +1000
@@ -898,7 +898,7 @@ static int super_1_load(mdk_rdev_t *rdev
rdev-data_offset = le64_to_cpu(sb-data_offset);
 
rdev-sb_size = le32_to_cpu(sb-max_dev) * 2 + 256;
-   bmask = block_size(rdev-bdev)-1;
+   bmask = queue_hardsect_size(rdev-bdev-bd_disk-queue)-1;
if (rdev-sb_size  bmask)
rdev- sb_size = (rdev-sb_size | bmask)+1;
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question regarding multipath Linux 2.6

2005-09-02 Thread Luca Berra

On Thu, Sep 01, 2005 at 02:51:44PM -0400, Jim Faulkner wrote:


Hello,

Recently my department had a SAN installed, and I am in the process of 
setting up one of the first Linux machines connected to it.  The machine 
is running Red Hat Enterprise AS4 (x86_64), which uses Linux kernel 
version 2.6.9-11.ELsmp.

giving more info about the infamous SAN would help :)


The SAN shows up twice in the kernel, as /dev/sdb and /dev/sdc.  /dev/sdb 
is inaccessible (I get a bunch of Buffer I/O error on device sdb kernel 
errors), but /dev/sdc works fine.  According to the administrator of the 

it probably is a cheapo storage with an Active/Passive storage
controller, you cannot use md to handle those.

He told me to use PowerPath, but I'd rather not have to reinstall or 

it is a long time i don't see powerpath on linux, but i am in favour of
ditching proprietary multipath solutions in favour of free ones.



what you want is multipath-tools http://christophe.varoqui.free.fr/
RH4 should already include a multipath-tools rpm.

Regards,
Luca

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Claas Hilbrecht
--Am Donnerstag, 1. September 2005 17:26 -0400 David M. Strang 
[EMAIL PROTECTED] schrieb:



The problem is; my array is now 26 of 28 disks -- /dev/sdm *IS* bad; it

[...]

What can I do? I don't believe this is working as intended.


I think the posts:

08.08.2005: How to recover a multiple raid5 disk failure with mdadm?
30.08.2005: 2 partition kicked from 6 raid5

describe the same problem. And it seems that no one was able to help. I 
hope you can rebuild your drive but I think you should use the backup if 
you need a quick solution. And indeed I think a howto resolve a multiple 
raid5 disk failure would be a good thing. Sometimes the problems are a 
faulty bus/cable and one knows that most (or even all) data is good.


--
Claas Hilbrecht
http://www.jucs-kramkiste.de


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Thursday September 1, [EMAIL PROTECTED] wrote:
 This is somewhat of a crosspost from my thread yesterday; but I think it 
 deserves it's own thread atm. Some time ago, I had a device fail -- with the 
 help of Neil, Tyler  others on the mailing list; a few patches to mdadm --  
 I was able to recover. Using mdadm --remove  mdadm --add, I was able to 
 rebuild the bad disc in my array. Everything seemed fine; however -- when I 
 rebooted and re-assembled the raid; it wouldn't take the disk that was 
 re-added. I had to add it again; and let it rebuild. About 3 weeks ago, I 
 lost power -- the outage lasted longer than the UPS, and my system shutdown. 
 Upon startup, once again -- I had to re-add 'the disk' back to the array. 
 For some reason, if I remove a device and add it back -- when I stop and 
 re-assemble the array - it won't 'start' that disk.

 
 I'm using mdadm 2.0-devel-3 on a Linux 2.6.11.12 kernel, with version-1 
 superblocks.

mdadm 2.0 had a fix for assembling version-1 arrays that would
particularly affect raid5.  Try using that instead of -devel-3.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Friday September 2, [EMAIL PROTECTED] wrote:
 
  mdadm 2.0 had a fix for assembling version-1 arrays that would
  particularly affect raid5.  Try using that instead of -devel-3.
 
 No luck -- 
 
 -([EMAIL PROTECTED])-(~)- # mdadm -A /dev/md0 /dev/sda /dev/sdb /dev/sdc 
 /dev/sdd 
 /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl 
 /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt 
 /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab -f
 mdadm: /dev/md0 assembled from 26 drives - not enough to start the array.
 
 Any other suggestions?
 

Can you run that with '-v' for me?

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Friday September 2, [EMAIL PROTECTED] wrote:
 
  Can you run that with '-v' for me?
 
 mdadm: looking for devices for /dev/md0
 mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
 mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
 mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2.
...

and just for completeness:

 mdadm -E /dev/sdm /dev/sdaa 

Thanks.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Friday September 2, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  On Friday September 2, [EMAIL PROTECTED] wrote:
   
Can you run that with '-v' for me?
   
   mdadm: looking for devices for /dev/md0
   mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
   mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
   mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2.
  ...
  
  and just for completeness:
  
   mdadm -E /dev/sdm /dev/sdaa 
  
 
 -([EMAIL PROTECTED])-(~)- # mdadm -E /dev/sdm /dev/sdaa

Thanks.

Looks like --assemble is not going to work any more for you.
However you should be able to recreate the array:

 mdadm -C /dev/md0 -l5 -n28 -c 128 --name=md/md0 -p la  /dev/sd[a-l] missing 
/dev/sd[n-z] /dev/sda[ab]

should get it right. (You did deliberately choose left-asymmetric I assume).

I've put a note on my todo list to test out these failure modes
and make sure it does the right thing next time.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread David M. Strang

Neil Brown wrote:

On Friday September 2, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  On Friday September 2, [EMAIL PROTECTED] wrote:
   
Can you run that with '-v' for me?
   
mdadm: looking for devices for /dev/md0
   mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
   mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
   mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2.
  ...
 
  and just for completeness:
 
   mdadm -E /dev/sdm /dev/sdaa
 

 -([EMAIL PROTECTED])-(~)- # mdadm -E /dev/sdm /dev/sdaa

Thanks.

Looks like --assemble is not going to work any more for you.
However you should be able to recreate the array:

 mdadm -C /dev/md0 -l5 -n28 -c 128 --name=md/md0 -p la  /dev/sd[a-l] 
missing /dev/sd[n-z] /dev/sda[ab]


should get it right. (You did deliberately choose left-asymmetric I 
assume).


I've put a note on my todo list to test out these failure modes
and make sure it does the right thing next time.


Does this mean I'm going to loose all my data?

-- David 


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Friday September 2, [EMAIL PROTECTED] wrote:
 
 Does this mean I'm going to loose all my data?
 

No.
At least, you shouldn't, and doing the --create won't make anything
worse.

So do the --create with the 'missing', and don't add any spares.
Do a 'fsck' or whatever to check that everything is OK.

If it isn't, we'll have to look again at exactly what happened and
figure out which disks we should have created into the array.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[EMAIL PROTECTED]: [suse-beta-e] RFC: time_adj for HZ==250 in kernel/timer.c ?]

2005-09-02 Thread Harald Koenig
Hi Ingo  linux-kernel,

as I'm not subscribed to linix-kernel list, please send answers as with CC: PM. 
 thanks!


playing with the openSUSE 10.0 beta-test I slipped over problems with procinfo 
etc.
so I noticed that at least suse/novell again changed the scheduler frequency HZ
from 1000 to 250.  which led me to the following question,  for which I got no
commets from SUSE so far, and since it's not a suse-only issue anyway... 
 

the linux kernel for suse 10.0 will run with HZ==250 by default (9.x used 
HZ==1000).

for HZ==100 and HZ==1000 there is special code in kernel/timer.c to 
adjust the value of time_adj  because HZ isn't a power of two and
clock stuff uses some binary shift operations instead of divides
(and 100!=128 or 1000!=1024).

what about having a similar special case for the new HZ==250 too ??
using 250 instead of 256 (SHIFT_HZ==8), which is an error of 2.4 %,
identical to 1000 vs. 1024.

any feelings or comments about the following patch ?

---
--- kernel/timer.c~ 2005-08-22 22:50:27.0 +0200
+++ kernel/timer.c  2005-08-31 15:12:57.801644750 +0200
@@ -752,7 +752,7 @@
 else
time_adj += (time_adj  2) + (time_adj  5);
 #endif
-#if HZ == 1000
+#if HZ == 1000 || HZ == 250
 /* Compensate for (HZ==1000) != (1  SHIFT_HZ).
  * Add 1.5625% and 0.78125% to get 1023.4375; = only 0.05% error (p. 14)
  */
---


or maybe (just in case...:) even

#if HZ == 1000 || HZ == 500 || HZ == 250



comments ?

Harald
-- 
I hope to die  ___   _
before I *have* to use Microsoft Word.,   0--,|/OOO\
Donald E. Knuth, 02-Oct-2001 in Tuebingen._/  /  /OOO\
\  \/OOO\
  \ O|//
Harald Koenig, \/\/\/\/\/\/\/\/\/
Inst.f.Theoret.Astrophysik  //  / \\  \
[EMAIL PROTECTED] ^   ^

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

- End forwarded message -


Harald
-- 
I hope to die  ___   _
before I *have* to use Microsoft Word.,   0--,|/OOO\
Donald E. Knuth, 02-Oct-2001 in Tuebingen._/  /  /OOO\
\  \/OOO\
  \ O|//
Harald Koenig, \/\/\/\/\/\/\/\/\/
Inst.f.Theoret.Astrophysik  //  / \\  \
[EMAIL PROTECTED] ^   ^
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md ] Make sure the new 'sb_size' is set properly device added without pre-existing superblock.

2005-09-02 Thread NeilBrown
Looks like I should run my test suite with both mdadm-1.12 and mdadm-2.0,
as this slipped through my testing.  (The bug is in code that
didn't reach 2.6.13.  Only -mm is affected).

Thanks,
NeilBrown


### Comments for Changeset

There are two ways to add devices to an md/raid array.

  It can have superblock written to it, and then given to the md driver,
  which will read the superblock (the new way)

or

  md can be told (through SET_ARRAY_INFO) the shape of the array, and
  the told about individual drives, and md will create the required
  superblock (the old way).

The newly introduced sb_size was only set for drives being added the
new way, not the old ways.  Oops :-(

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/md.c |2 ++
 1 file changed, 2 insertions(+)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2005-09-02 16:00:04.0 +1000
+++ ./drivers/md/md.c   2005-09-02 19:29:31.0 +1000
@@ -2303,6 +2303,8 @@ static int add_new_disk(mddev_t * mddev,
else
rdev-in_sync = 0;
 
+   rdev-sb_size = MD_SB_BYTES;
+
if (info-state  (1MD_DISK_WRITEMOSTLY))
set_bit(WriteMostly, rdev-flags);
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: question regarding multipath Linux 2.6

2005-09-02 Thread Jim Faulkner

Yes, a copy of the whitepaper would be most useful.  If you could e-mail 
it to me or make it available on a website for download, that would be 
great.

thanks,
Jim Faulkner

On Fri, 2 Sep 2005, Callahan, Tom wrote:

 Your running into the problem of the active/passive link as you stated. If
 you are using Qlogic FibreChannel cards, qlogic offers drivers on their
 website for this.
 
 I have a whitepaper I've written on this topic, please contact me if you
 would like a copy. We currently have many 2.6 Kernel Linux machines running
 exclusively off the SAN, with true failover support.
 
 Thanks,
 Tom Callahan 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 To: linux-raid@vger.kernel.org
 Sent: 9/2/2005 2:38 AM
 Subject: Re: question regarding multipath  Linux 2.6
 
 On Thu, Sep 01, 2005 at 02:51:44PM -0400, Jim Faulkner wrote:
 
 Hello,
 
 Recently my department had a SAN installed, and I am in the process of 
 setting up one of the first Linux machines connected to it.  The
 machine 
 is running Red Hat Enterprise AS4 (x86_64), which uses Linux kernel 
 version 2.6.9-11.ELsmp.
 giving more info about the infamous SAN would help :)
 
 The SAN shows up twice in the kernel, as /dev/sdb and /dev/sdc.
 /dev/sdb 
 is inaccessible (I get a bunch of Buffer I/O error on device sdb
 kernel 
 errors), but /dev/sdc works fine.  According to the administrator of
 the 
 it probably is a cheapo storage with an Active/Passive storage
 controller, you cannot use md to handle those.
 
 He told me to use PowerPath, but I'd rather not have to reinstall or 
 it is a long time i don't see powerpath on linux, but i am in favour of
 ditching proprietary multipath solutions in favour of free ones.
 
 what you want is multipath-tools http://christophe.varoqui.free.fr/
 RH4 should already include a multipath-tools rpm.
 
 Regards,
 Luca
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Where is the performance bottleneck?

2005-09-02 Thread Al Boldi
Holger Kiehl wrote:
 top - 08:39:11 up  2:03,  2 users,  load average: 23.01, 21.48, 15.64
 Tasks: 102 total,   2 running, 100 sleeping,   0 stopped,   0 zombie
 Cpu(s):  0.0% us, 17.7% sy,  0.0% ni,  0.0% id, 78.9% wa,  0.2% hi,  3.1%
 si Mem:   8124184k total,  8093068k used,31116k free,  7831348k
 buffers Swap: 15631160k total,13352k used, 15617808k free, 5524k
 cached

PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   3423 root  18   0 55204  460  392 R 12.0  0.0   1:15.55 dd
   3421 root  18   0 55204  464  392 D 11.3  0.0   1:17.36 dd
   3418 root  18   0 55204  464  392 D 10.3  0.0   1:10.92 dd
   3416 root  18   0 55200  464  392 D 10.0  0.0   1:09.20 dd
   3420 root  18   0 55204  464  392 D 10.0  0.0   1:10.49 dd
   3422 root  18   0 55200  460  392 D  9.3  0.0   1:13.58 dd
   3417 root  18   0 55204  460  392 D  7.6  0.0   1:13.11 dd
158 root  15   0 000 D  1.3  0.0   1:12.61 kswapd3
159 root  15   0 000 D  1.3  0.0   1:08.75 kswapd2
160 root  15   0 000 D  1.0  0.0   1:07.11 kswapd1
   3419 root  18   0 51096  552  476 D  1.0  0.0   1:17.15 dd
161 root  15   0 000 D  0.7  0.0   0:54.46 kswapd0

 A loadaverage of 23 for 8 dd's seems a bit high. Also why is kswapd
 working so hard? Is that correct.

Actually, kswapd is another problem. (see Kswapd Flaw  thread)
Which has little impact on your problem but basically kswapd tries very hard 
maybe even to hard to fullfil a request for memory, so when the buffer/cache 
pages are full kswapd tries to find some more unused memory. When it finds 
none it starts recycling the buffer/cache pages.  Which is OK, but it only 
does this after searching for swappable memory which wastes CPU cycles.

This can be tuned a little but not much by adjusting /sys(proc)/.../vm/...
Or renicing kswapd to the lowest priority, which may cause other problems.

Things get really bad when procs start asking for more memory than is 
available, causing kswapd to take the liberty of paging out running procs in 
the hope that these procs won't come back later.  So when they do come back 
something like a wild goose chase begins.  This is also known as OverCommit. 

This is closely related to the dreaded OOM-killer, which occurs when the 
system cannot satisfy a memory request for a returning proc, causing the VM 
to start killing in an unpredictable manner.

Turning OverCommit off should solve this problem but it doesn't.

This is why it is recommended to run the system always with swap enabled even 
if you have tons of memory, which really only pushes the problem out of the 
way until you hit the dead end and the wild goose chase begins again.

Sadly 2.6.13 did not fix this either.

Although this description only vaguely defines the problem from an end-user 
pov, the actual semantics may be quite different.

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread Christopher Smith

Daniel Pittman wrote:

Christopher Smith [EMAIL PROTECTED] writes:

[...]



The components are 12x400GB drives attached to a 3ware 9500s-12
controller.  They are configured as single disks on the controller,
ie: no hardware RAID is involved.



A quick question for you, because I have a client looking at 3ware RAID
hardware at the moment:

Why are you running this as software RAID, rather than using the
hardware on the 3ware card?


Because after doing some preliminary benchmarks, I've found Linux's 
software RAID to be significantly faster than 3ware's hardware RAID (at 
the sacrifice of higher CPU usage, but since the machine has a fairly 
fast CPU and doesn't do anything else, that's a sacrifice I'm happy to 
make).


I have some iozone and bonnie++ results, but they're at work and I'm at 
home - I'll post them tomorrow.


CS
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread Brad Dameron
On Fri, 2005-09-02 at 20:38 +1000, Daniel Pittman wrote:
 Christopher Smith [EMAIL PROTECTED] writes:
 
 [...]
 
  The components are 12x400GB drives attached to a 3ware 9500s-12
  controller.  They are configured as single disks on the controller,
  ie: no hardware RAID is involved.
 
 A quick question for you, because I have a client looking at 3ware RAID
 hardware at the moment:

I would also look at the Areca line of cards. They are much faster than
the 3ware and support up to 24 port if needed. Also do RAID6 now.

http://www.areca.com.tw/index/html/index.htm


Brad Dameron
SeaTab Software
www.seatab.com

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread berk walker
Brad Dameron wrote:
On Fri, 2005-09-02 at 20:38 +1000, Daniel Pittman wrote:
  
Christopher Smith [EMAIL PROTECTED] writes:

[...]


The components are 12x400GB drives attached to a 3ware 9500s-12
controller.  They are configured as single disks on the controller,
ie: no hardware RAID is involved.
  
A quick question for you, because I have a client looking at 3ware RAID
hardware at the moment:


I would also look at the Areca line of cards. They are much faster than
the 3ware and support up to 24 port if needed. Also do RAID6 now.

http://www.areca.com.tw/index/html/index.htm


Brad Dameron
SeaTab Software
www.seatab.com

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  
I guess if we were all wholesalers with a nice long lead time, that
would be great, Brad.  But where, and for how much might one purchase these?

b-

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread Brad Dameron
On Thu, 2005-09-01 at 13:50 -0400, berk walker wrote:
 I guess if we were all wholesalers with a nice long lead time, that
 would be great, Brad.  But where, and for how much might one purchase these?
 
 b-
 

http://www.topmicrousa.com/controllers--tekram.html
http://www.rackmountpro.com/productsearch.php?catid=199
http://www.pc-pitstop.com/sata_raid_controllers/

Just for starters. Google found those by the way. And yes they are a
little more than 3ware. But I can say they do twice the performance of
the 3ware cards. Mainly due to their 800Mhz processor and faster memory.
Their driver support is also more upt to date.

Brad Dameron
SeaTab Software
www.seatab.com


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread Ming Zhang
On Fri, 2005-09-02 at 11:09 -0700, Brad Dameron wrote:
 On Thu, 2005-09-01 at 13:50 -0400, berk walker wrote:
  I guess if we were all wholesalers with a nice long lead time, that
  would be great, Brad.  But where, and for how much might one purchase these?
  
  b-
  
 
 http://www.topmicrousa.com/controllers--tekram.html
 http://www.rackmountpro.com/productsearch.php?catid=199
 http://www.pc-pitstop.com/sata_raid_controllers/
 
 Just for starters. Google found those by the way. And yes they are a
 little more than 3ware. But I can say they do twice the performance of
 the 3ware cards. Mainly due to their 800Mhz processor and faster memory.
 Their driver support is also more upt to date.
 

when u talk about 2 x performance, do u have any performance data to
back u claim?



 Brad Dameron
 SeaTab Software
 www.seatab.com
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3ware RAID (was Re: RAID resync stalled at 99.7% ?)

2005-09-02 Thread Joshua Baker-LePain
On Fri, 2 Sep 2005 at 11:09am, Brad Dameron wrote

 On Thu, 2005-09-01 at 13:50 -0400, berk walker wrote:
  I guess if we were all wholesalers with a nice long lead time, that
  would be great, Brad.  But where, and for how much might one purchase these?
  
  b-
  
 
 http://www.topmicrousa.com/controllers--tekram.html
 http://www.rackmountpro.com/productsearch.php?catid=199
 http://www.pc-pitstop.com/sata_raid_controllers/
 
 Just for starters. Google found those by the way. And yes they are a
 little more than 3ware. But I can say they do twice the performance of
 the 3ware cards. Mainly due to their 800Mhz processor and faster memory.
 Their driver support is also more upt to date.

But has it made it into the mainline kernel yet?  3w- has been in 
mainline for a *long* time.  3w-9xxx is still settling a bit, that's true, 
but it's doing so in mainline.  Personally, I wouldn't put important data 
on an Areca controller until the drivers have been through wider testing 
than the -mm series (where I believe they're currently percolating).

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Friday September 2, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  On Friday September 2, [EMAIL PROTECTED] wrote:
  
   Does this mean I'm going to loose all my data?
  
 
  No.
  At least, you shouldn't, and doing the --create won't make anything
  worse.
 
  So do the --create with the 'missing', and don't add any spares.
  Do a 'fsck' or whatever to check that everything is OK.
 
  If it isn't, we'll have to look again at exactly what happened and
  figure out which disks we should have created into the array.
 
 -([EMAIL PROTECTED])-(~)- # mdadm -C /dev/md0 -l5 -n28 -c 128 --name=md/md0 
 -p la 
 /dev/sd[a-l] missing /dev/sd[n-z] /dev/sda[ab]
 mdadm: invalid number of raid devices: 28

Sorry.  Add
   -e 1

 
 I'm using mdadm 2.0 now --- should I try 2.0-devel-3 ?

No.  -devel-3 should not be used anymore.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread David M. Strang

Neil Brown wrote:

On Friday September 2, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  On Friday September 2, [EMAIL PROTECTED] wrote:
  
   Does this mean I'm going to loose all my data?
  
 
  No.
  At least, you shouldn't, and doing the --create won't make anything
  worse.
 
  So do the --create with the 'missing', and don't add any spares.
  Do a 'fsck' or whatever to check that everything is OK.
 
  If it isn't, we'll have to look again at exactly what happened and
  figure out which disks we should have created into the array.

 -([EMAIL PROTECTED])-(~)- # mdadm -C /dev/md0 -l5 -n28 -c 128 --name=md/md0 -p 
 la

 /dev/sd[a-l] missing /dev/sd[n-z] /dev/sda[ab]
 mdadm: invalid number of raid devices: 28

Sorry.  Add
   -e 1


Well, I'm quite happy to report --- that worked!

###
reiserfsck --check started at Fri Sep  2 12:29:18 2005
###
Replaying journal..
Trans replayed: mountid 23, transid 172404, desc 3638, len 14, commit 3653, 
next trans offset 3636
Trans replayed: mountid 23, transid 172405, desc 3654, len 1, commit 3656, 
next trans offset 3639
Trans replayed: mountid 23, transid 172406, desc 3657, len 1, commit 3659, 
next trans offset 3642
Trans replayed: mountid 23, transid 172407, desc 3660, len 1, commit 3662, 
next trans offset 3645
Trans replayed: mountid 23, transid 172408, desc 3663, len 1, commit 3665, 
next trans offset 3648
Trans replayed: mountid 23, transid 172409, desc 3666, len 1, commit 3668, 
next trans offset 3651
Trans replayed: mountid 23, transid 172410, desc 3669, len 1, commit 3671, 
next trans offset 3654

Reiserfs journal '/dev/md0' in blocks [18..8211]: 7 transactions replayed
Checking internal tree..finished
Comparing bitmaps..finished
Checking Semantic tree:
finished
No corruptions found
There are on the filesystem:
   Leaves 279225
   Internal nodes 1696
   Directories 1962
   Other files 15922
   Data block pointers 280976644 (0 of them are zero)
   Safe links 0
###
reiserfsck finished at Fri Sep  2 13:13:53 2005
###

And, no data corruption!

So, once I get the bad drive replaced; and the array re-synced -- will I 
want to stop the array, and execute:


mdadm -C /dev/md0 -e1 -l5 -n28 -c 128 --name=md/md0 -p la /dev/sd[a-z] 
/dev/sda[ab]


Just so I don't have a problem with a disk 'not really' being part of the 
array? IE; mdadm: /dev/sdm is identified as a member of /dev/md0, slot -1.


Also, most of the drives -- have no partitions on them (ie; cfdisk 
/dev/sda) -- Can I add them and set the type to FD so it will autodetect the 
raid? Or must I do that prior to raid creation?


Thanks again for all the help Neil, once again -- I'm able to recover what 
seemed hopeless with zero dataloss.


-- David



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD or MDADM bug?

2005-09-02 Thread Neil Brown
On Friday September 2, [EMAIL PROTECTED] wrote:
 
  Sorry.  Add
 -e 1
 
 Well, I'm quite happy to report --- that worked!

Excellent!

 
 So, once I get the bad drive replaced; and the array re-synced -- will I 
 want to stop the array, and execute:
 
 mdadm -C /dev/md0 -e1 -l5 -n28 -c 128 --name=md/md0 -p la /dev/sd[a-z] 
 /dev/sda[ab]
 
 Just so I don't have a problem with a disk 'not really' being part of the 
 array? IE; mdadm: /dev/sdm is identified as a member of /dev/md0,
 slot -1.

That shouldn't be necessary.  Providing you are using mdadm-2.0, you
should just be able to --add the drive and everything should work
fine.


 
 Also, most of the drives -- have no partitions on them (ie; cfdisk 
 /dev/sda) -- Can I add them and set the type to FD so it will autodetect the 
 raid? Or must I do that prior to raid creation?

Type FD doesn't work with version-1 superblocks.  The kernel will not
auto-assemble them at all.  Just use mdadm to assemble them (in an
rc.d script).

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html