Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Yogesh Pahilwan
Hi Folks,

I am trying to create RAID5 array using mdadm on kernel 2.6.15 as

# mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt -l5 -n4
/dev/sd{a,b,c,d}

But when I execute this command getting the following error:

mdadm: RUN_ARRAY failed: Cannot allocate memory

# dmesg shows

Apr 11 00:39:40 localhost kernel: md: bindsda
Apr 11 00:39:40 localhost kernel: md: bindsdb
Apr 11 00:39:40 localhost kernel: md: bindsdc
Apr 11 00:39:40 localhost kernel: raid5: automatically using best
checksumming function: generic_sse
Apr 11 00:39:40 localhost kernel:generic_sse:  4111.000 MB/sec
Apr 11 00:39:40 localhost kernel: raid5: using function: generic_sse
(4111.000 MB/sec)
Apr 11 00:39:40 localhost kernel: md: raid5 personality registered as nr 4
Apr 11 00:39:41 localhost kernel: md0: bitmap file is out of date (0  1) --
forcing full recovery
Apr 11 00:39:41 localhost kernel: md0: failed to create bitmap (-12)
Apr 11 00:39:41 localhost kernel: md: pers-run() failed ...
Apr 11 00:39:41 localhost kernel: md: md0 stopped.
Apr 11 00:39:41 localhost kernel: md: unbindsdc
Apr 11 00:39:41 localhost kernel: md: export_rdev(sdc)
Apr 11 00:39:41 localhost kernel: md: unbindsdb
Apr 11 00:39:41 localhost kernel: md: export_rdev(sdb)
Apr 11 00:39:41 localhost kernel: md: unbindsda
Apr 11 00:39:41 localhost kernel: md: export_rdev(sda)

I am not able to get why mdadm is failing.

Please suggest some pointers so that I can solve this problem.

Thanks,
Yogesh Pahilwan

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Neil Brown
On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Folks,
 
 I am trying to create RAID5 array using mdadm on kernel 2.6.15 as
 
 # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt -l5 -n4
 /dev/sd{a,b,c,d}
 
 But when I execute this command getting the following error:
 
 mdadm: RUN_ARRAY failed: Cannot allocate memory

How big are your devices?

Try setting a larger bitmap chunk size
--bitmap-chunk=1024
maybe.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Yogesh Pahilwan
Hi Neil,

I have set --bitmap-chunk=1024 and RAID5 gets created successfully.

But why I will have to set --bitmap-chunk for big size devices such as 500GB
each in my case?

What is the default value of --bitmap-chunk?

Thanks,
Yogesh

-Original Message-
From: Neil Brown [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 12:37 PM
To: Yogesh Pahilwan
Cc: linux-raid@vger.kernel.org
Subject: Re: Problem in creating RAID5 MD array with kernel 2.6.15

On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Folks,
 
 I am trying to create RAID5 array using mdadm on kernel 2.6.15 as
 
 # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt -l5
-n4
 /dev/sd{a,b,c,d}
 
 But when I execute this command getting the following error:
 
 mdadm: RUN_ARRAY failed: Cannot allocate memory

How big are your devices?

Try setting a larger bitmap chunk size
--bitmap-chunk=1024
maybe.

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Neil Brown
On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Neil,
 
 I have set --bitmap-chunk=1024 and RAID5 gets created successfully.

Good.

 
 But why I will have to set --bitmap-chunk for big size devices such as 500GB
 each in my case?
 
 What is the default value of --bitmap-chunk?

4, which is probably too low.

For every 2048 chunks, md potentially needs to allocate one page.
md also needs to allocate a table to hold all these pages.

At a chunk size of 4K, your 500GB would use 125million chunks.
That's 64000 pages - but these are only allocated on demand, and we can
survive failure.
However the table would need 4 bytes per page, or 250K
Allocating a 250K stable is unlikely to succeed due to memory
fragmentation.

With 1024K chunks you only need 1K, which is easy.

You could safely go down to 256K chunks but I'm not sure it would gain
much.

I have put a note on my mdadm todo list to choose a more sensible
default chunk size which limits the number of chunks to 2million.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Yogesh Pahilwan
Hi Neil,

Actually I want to calculate the performance of a RAID5 MD array in rebuild
state.

For doing this I do the following steps:

# mdadm -f /dev/md0 /dev/sda
mdadm: set /dev/sda faulty in /dev/md0

# mdadm -r /dev/md0 /dev/sda
mdadm: hot remove failed for /dev/sda: Device or resource busy

# tail -f /var/log/messages shows 

Apr 11 01:48:11 localhost kernel:  1raid5: Disk failure on sda, disabling
device. Operation continuing on 3 devices
Apr 11 01:48:24 localhost ntpd[3540]: synchronized to LOCAL(0), stratum 10
Apr 11 01:48:24 localhost ntpd[3540]: kernel time sync disabled 0041
Apr 11 01:48:26 localhost kernel: md: cannot remove active disk sda from md0
...
Apr 11 01:49:26 localhost ntpd[3540]: synchronized to 10.8.0.8, stratum 3
Apr 11 01:50:51 localhost kernel: md: cannot remove active disk sda from md0
...
Apr 11 01:51:58 localhost kernel: md: cannot remove active disk sda from md0
...
Apr 11 01:54:16 localhost kernel: md: cannot remove active disk sda from md0
...
Apr 11 01:57:11 localhost kernel: md: cannot remove active disk sda from md0
.

I am not getting why I am not able to hot remove /dev/sda from /dev/md0?


# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Tue Apr 11 01:47:20 2006
 Raid Level : raid5
 Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
Device Size : 488386496 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

  Intent Bitmap : /tmp/bitmap.txt

Update Time : Tue Apr 11 01:47:20 2006
  State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : 5ce49b71:e6083c2a:121b9ac2:cb675771
 Events : 0.1

Number   Major   Minor   RaidDevice State
   0   800  faulty spare rebuilding   /dev/sda
   1   8   161  active sync   /dev/sdb
   2   8   322  active sync   /dev/sdc
   3   8   483  active sync   /dev/sdd

This output shows that RAID5 /dev/md0 is in the degraded mode?

How should I rebuild this RAID5 so that I can calculate I/O performance
while rebuilding RAID5 MD Array?

Thanks,
Yogesh






-Original Message-
From: Neil Brown [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 1:01 PM
To: Yogesh Pahilwan
Cc: linux-raid@vger.kernel.org
Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15

On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Neil,
 
 I have set --bitmap-chunk=1024 and RAID5 gets created successfully.

Good.

 
 But why I will have to set --bitmap-chunk for big size devices such as
500GB
 each in my case?
 
 What is the default value of --bitmap-chunk?

4, which is probably too low.

For every 2048 chunks, md potentially needs to allocate one page.
md also needs to allocate a table to hold all these pages.

At a chunk size of 4K, your 500GB would use 125million chunks.
That's 64000 pages - but these are only allocated on demand, and we can
survive failure.
However the table would need 4 bytes per page, or 250K
Allocating a 250K stable is unlikely to succeed due to memory
fragmentation.

With 1024K chunks you only need 1K, which is easy.

You could safely go down to 256K chunks but I'm not sure it would gain
much.

I have put a note on my mdadm todo list to choose a more sensible
default chunk size which limits the number of chunks to 2million.

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Neil Brown
On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Neil,
 
 Actually I want to calculate the performance of a RAID5 MD array in rebuild
 state.
 
 For doing this I do the following steps:
 
 # mdadm -f /dev/md0 /dev/sda
 mdadm: set /dev/sda faulty in /dev/md0
 
 # mdadm -r /dev/md0 /dev/sda
 mdadm: hot remove failed for /dev/sda: Device or resource busy

Hmmm That shouldn't happen.  I think you have found a bug :-(

However I cannot trivially reproduce it.

 - Can you reproduce this behaviour (mdadm -r failing) ?
-  If so, can you list the steps?
-  Can you reproduce on 2.6.16?
-  Can you reproduce it without use a bitmap?

If you cannot reproduce it, please tell me as much as possible about
what led up to this situation.  Do you add/fail other drives? Did you
create or mount a filesystem, etc.

Thanks,
NeilBrown


 
 # tail -f /var/log/messages shows 
 
 Apr 11 01:48:11 localhost kernel:  1raid5: Disk failure on sda, disabling
 device. Operation continuing on 3 devices
 Apr 11 01:48:24 localhost ntpd[3540]: synchronized to LOCAL(0), stratum 10
 Apr 11 01:48:24 localhost ntpd[3540]: kernel time sync disabled 0041
 Apr 11 01:48:26 localhost kernel: md: cannot remove active disk sda from md0
 ...
 Apr 11 01:49:26 localhost ntpd[3540]: synchronized to 10.8.0.8, stratum 3
 Apr 11 01:50:51 localhost kernel: md: cannot remove active disk sda from md0
 ...
 Apr 11 01:51:58 localhost kernel: md: cannot remove active disk sda from md0
 ...
 Apr 11 01:54:16 localhost kernel: md: cannot remove active disk sda from md0
 ...
 Apr 11 01:57:11 localhost kernel: md: cannot remove active disk sda from md0
 .
 
 I am not getting why I am not able to hot remove /dev/sda from /dev/md0?
 
 
 # mdadm -D /dev/md0
 /dev/md0:
 Version : 00.90.03
   Creation Time : Tue Apr 11 01:47:20 2006
  Raid Level : raid5
  Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
 Device Size : 488386496 (465.76 GiB 500.11 GB)
Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 0
 Persistence : Superblock is persistent
 
   Intent Bitmap : /tmp/bitmap.txt
 
 Update Time : Tue Apr 11 01:47:20 2006
   State : clean, degraded
  Active Devices : 3
 Working Devices : 3
  Failed Devices : 1
   Spare Devices : 0
 
  Layout : left-symmetric
  Chunk Size : 64K
 
UUID : 5ce49b71:e6083c2a:121b9ac2:cb675771
  Events : 0.1
 
 Number   Major   Minor   RaidDevice State
0   800  faulty spare rebuilding   /dev/sda
1   8   161  active sync   /dev/sdb
2   8   322  active sync   /dev/sdc
3   8   483  active sync   /dev/sdd
 
 This output shows that RAID5 /dev/md0 is in the degraded mode?
 
 How should I rebuild this RAID5 so that I can calculate I/O performance
 while rebuilding RAID5 MD Array?
 
 Thanks,
 Yogesh
 
 
 
 
 
 
 -Original Message-
 From: Neil Brown [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, April 11, 2006 1:01 PM
 To: Yogesh Pahilwan
 Cc: linux-raid@vger.kernel.org
 Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15
 
 On Tuesday April 11, [EMAIL PROTECTED] wrote:
  Hi Neil,
  
  I have set --bitmap-chunk=1024 and RAID5 gets created successfully.
 
 Good.
 
  
  But why I will have to set --bitmap-chunk for big size devices such as
 500GB
  each in my case?
  
  What is the default value of --bitmap-chunk?
 
 4, which is probably too low.
 
 For every 2048 chunks, md potentially needs to allocate one page.
 md also needs to allocate a table to hold all these pages.
 
 At a chunk size of 4K, your 500GB would use 125million chunks.
 That's 64000 pages - but these are only allocated on demand, and we can
 survive failure.
 However the table would need 4 bytes per page, or 250K
 Allocating a 250K stable is unlikely to succeed due to memory
 fragmentation.
 
 With 1024K chunks you only need 1K, which is easy.
 
 You could safely go down to 256K chunks but I'm not sure it would gain
 much.
 
 I have put a note on my mdadm todo list to choose a more sensible
 default chunk size which limits the number of chunks to 2million.
 
 NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Questions about: Where to find algorithms for RAID5 / RAID6

2006-04-11 Thread Maurice Hilarius
Good day.

I am looking for some information, and hope the readers of this list
might be able to point me in the right direction:

Here is the scenario:
In RAID5 ( or RAID6) when a file is written, some parity data is
created, (by some form of XOR process, I assume), then that parity data
is written to disk.

I am looking to find the algorithm that is used to create that parity
data and to decides where to place it on the disks.

Any help on this is deeply appreciated.

-- 

With our best regards,


Maurice W. HilariusTelephone: 01-780-456-9771
Hard Data Ltd.  FAX:   01-780-456-9772
11060 - 166 Avenue email:[EMAIL PROTECTED]
Edmonton, AB, Canada   http://www.harddata.com/
   T5X 1Y3


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Yogesh Pahilwan
Hi Neil,

When I try the following steps on kernel 2.6.16 I am getting the same errors
as on kernel 2.6.15.

# mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt
--bitmap-chunk=1024 -l5 -n4 /dev/sd{a,b,c,d}

Array gets created successfully.

# mdadm /dev/md0 -f /dev/sda
mdadm: set /dev/sda faulty in /dev/md0

# mdadm /dev/md0 -r /dev/sda
mdadm: hot remove failed for /dev/sda: Device or resource busy

# cat /proc/mdstat shows
Personalities : [raid5] [raid4]
md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0](F)
  1465159488 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]
  bitmap: 0/233 pages [0KB], 1024KB chunk, file: /tmp/bitmap.txt

unused devices: none

# mdadm -D /dev/md0 shows

/dev/md0:
Version : 00.90.03
  Creation Time : Tue Apr 11 03:39:28 2006
 Raid Level : raid5
 Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
Device Size : 488386496 (465.76 GiB 500.11 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

  Intent Bitmap : /tmp/bitmap.txt

Update Time : Tue Apr 11 03:39:28 2006
  State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

   UUID : ee25e1fa:d1ee78d5:d63869c6:dd7cff82
 Events : 0.1

Number   Major   Minor   RaidDevice State
   0   800  faulty spare rebuilding   /dev/sda
   1   8   161  active sync   /dev/sdb
   2   8   322  active sync   /dev/sdc
   3   8   483  active sync   /dev/sdd


# tail -f /var/log/messages shows

Apr 11 03:40:47 localhost kernel:  4md: cannot remove active disk sda from
md0 ...
Apr 11 03:41:29 localhost kernel: raid5: Disk failure on sda, disabling
device. Operation continuing on 3 devices
Apr 11 03:41:35 localhost kernel: md: cannot remove active disk sda from md0
...


-Original Message-
From: Neil Brown [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 1:52 PM
To: Yogesh Pahilwan
Cc: linux-raid@vger.kernel.org
Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15

On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Neil,
 
 Actually I want to calculate the performance of a RAID5 MD array in
rebuild
 state.
 
 For doing this I do the following steps:
 
 # mdadm -f /dev/md0 /dev/sda
 mdadm: set /dev/sda faulty in /dev/md0
 
 # mdadm -r /dev/md0 /dev/sda
 mdadm: hot remove failed for /dev/sda: Device or resource busy

Hmmm That shouldn't happen.  I think you have found a bug :-(

However I cannot trivially reproduce it.

 - Can you reproduce this behaviour (mdadm -r failing) ?
-  If so, can you list the steps?
-  Can you reproduce on 2.6.16?
-  Can you reproduce it without use a bitmap?

If you cannot reproduce it, please tell me as much as possible about
what led up to this situation.  Do you add/fail other drives? Did you
create or mount a filesystem, etc.

Thanks,
NeilBrown


 
 # tail -f /var/log/messages shows 
 
 Apr 11 01:48:11 localhost kernel:  1raid5: Disk failure on sda,
disabling
 device. Operation continuing on 3 devices
 Apr 11 01:48:24 localhost ntpd[3540]: synchronized to LOCAL(0), stratum 10
 Apr 11 01:48:24 localhost ntpd[3540]: kernel time sync disabled 0041
 Apr 11 01:48:26 localhost kernel: md: cannot remove active disk sda from
md0
 ...
 Apr 11 01:49:26 localhost ntpd[3540]: synchronized to 10.8.0.8, stratum 3
 Apr 11 01:50:51 localhost kernel: md: cannot remove active disk sda from
md0
 ...
 Apr 11 01:51:58 localhost kernel: md: cannot remove active disk sda from
md0
 ...
 Apr 11 01:54:16 localhost kernel: md: cannot remove active disk sda from
md0
 ...
 Apr 11 01:57:11 localhost kernel: md: cannot remove active disk sda from
md0
 .
 
 I am not getting why I am not able to hot remove /dev/sda from /dev/md0?
 
 
 # mdadm -D /dev/md0
 /dev/md0:
 Version : 00.90.03
   Creation Time : Tue Apr 11 01:47:20 2006
  Raid Level : raid5
  Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
 Device Size : 488386496 (465.76 GiB 500.11 GB)
Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 0
 Persistence : Superblock is persistent
 
   Intent Bitmap : /tmp/bitmap.txt
 
 Update Time : Tue Apr 11 01:47:20 2006
   State : clean, degraded
  Active Devices : 3
 Working Devices : 3
  Failed Devices : 1
   Spare Devices : 0
 
  Layout : left-symmetric
  Chunk Size : 64K
 
UUID : 5ce49b71:e6083c2a:121b9ac2:cb675771
  Events : 0.1
 
 Number   Major   Minor   RaidDevice State
0   800  faulty spare rebuilding   /dev/sda
1   8   161  active sync   /dev/sdb
2   8   322  active sync   /dev/sdc
3   8   483  active sync   /dev/sdd
 
 This output shows that RAID5 /dev/md0 is in the degraded mode?
 
 How 

RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Neil Brown
On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Neil,
 
 When I try the following steps on kernel 2.6.16 I am getting the same errors
 as on kernel 2.6.15.
 
 # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt
 --bitmap-chunk=1024 -l5 -n4 /dev/sd{a,b,c,d}
 
 Array gets created successfully.
 
 # mdadm /dev/md0 -f /dev/sda
 mdadm: set /dev/sda faulty in /dev/md0
 
 # mdadm /dev/md0 -r /dev/sda
 mdadm: hot remove failed for /dev/sda: Device or resource busy

Thanks for testing.
If I do exactly the same sequence it works perfectly.  So there must
be some important different between your setup and mine.

- What happens if you leave off the --bitmap=/tmp/bitmap.txt 
--bitmap-chunk=1024
  Does it then work?

- What filesystem is on /tmp

- Can you give me complete /var/log/messages from before you create
  the array until this error?

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Problem in creating RAID5 MD array with kernel 2.6.15

2006-04-11 Thread Yogesh Pahilwan
Hi Neil,

Can you provide me details of your setup?
Is there any kernel configuration that I will have to change and build my
kernel with that?

Thanks,
Yogesh


-Original Message-
From: Neil Brown [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 11, 2006 3:54 PM
To: Yogesh Pahilwan
Cc: linux-raid@vger.kernel.org
Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15

On Tuesday April 11, [EMAIL PROTECTED] wrote:
 Hi Neil,
 
 When I try the following steps on kernel 2.6.16 I am getting the same
errors
 as on kernel 2.6.15.
 
 # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt
 --bitmap-chunk=1024 -l5 -n4 /dev/sd{a,b,c,d}
 
 Array gets created successfully.
 
 # mdadm /dev/md0 -f /dev/sda
 mdadm: set /dev/sda faulty in /dev/md0
 
 # mdadm /dev/md0 -r /dev/sda
 mdadm: hot remove failed for /dev/sda: Device or resource busy

Thanks for testing.
If I do exactly the same sequence it works perfectly.  So there must
be some important different between your setup and mine.

- What happens if you leave off the --bitmap=/tmp/bitmap.txt
--bitmap-chunk=1024
  Does it then work?

- What filesystem is on /tmp

- Can you give me complete /var/log/messages from before you create
  the array until this error?

Thanks,
NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm + raid1 of 2 disks and now need to add more

2006-04-11 Thread Andy Smith
On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote:
 I have two SCSI disks on raid1.
 Since I have lots of reads from that raid, I want to add two more
 disks to this raid so that read will be faster.
 
 How should I add the new disks?

Is this possible with md currently:

Create a RAID-10 on the two new disks specifying one disk missing
from each mirror.

Then copy data over and add the two existing disks letting it
resync?


signature.asc
Description: Digital signature


Re: mdadm + raid1 of 2 disks and now need to add more

2006-04-11 Thread Laurent CARON

Andy Smith wrote:

On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote:

I have two SCSI disks on raid1.
Since I have lots of reads from that raid, I want to add two more
disks to this raid so that read will be faster.

How should I add the new disks?


Is this possible with md currently:

Create a RAID-10 on the two new disks specifying one disk missing
from each mirror.

Then copy data over and add the two existing disks letting it
resync?


Why not growing the array with 2 more disks?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm + raid1 of 2 disks and now need to add more

2006-04-11 Thread Andy Smith
On Tue, Apr 11, 2006 at 07:25:58PM +0200, Laurent CARON wrote:
 Andy Smith wrote:
 On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote:
 I have two SCSI disks on raid1.
 Since I have lots of reads from that raid, I want to add two more
 disks to this raid so that read will be faster.
 
 How should I add the new disks?
 
 Is this possible with md currently:
 
 Create a RAID-10 on the two new disks specifying one disk missing
 from each mirror.
 
 Then copy data over and add the two existing disks letting it
 resync?
 
 Why not growing the array with 2 more disks?

Well I guess a RAID-1 of 4 disks would be slightly more redundant
than a 4 disk RAID-10, but it would have half the capacity, and the
read performance would be very similar, no?



signature.asc
Description: Digital signature


Re: linear writes to raid5

2006-04-11 Thread Alex Tomas
 Mark Hahn (MH) writes:

 MH don't you mean _3_ chunk-sized writes?  if so, are you actually
 MH asking about the case when you issue an aligned two-stripe write?
 MH (which might get broken into 6 64K writes, not sure, rather than 
 MH three 2-chunk writes...)

actually, yes. I'm talking about 3 requests: 2 of data and one of parity.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linear writes to raid5

2006-04-11 Thread Alex Tomas
 Neil Brown (NB) writes:

 NB The raid5 code attempts to do this already, though I'm not sure how
 NB successful it is.  I think it is fairly successful, but not completely
 NB successful. 

hmm. could you tell me what the code should I look at?


 NB There is a trade-off that raid5 has to make.  Waiting longer can mean
 NB more blocks on the same stripe, and so less reads.  But waiting longer
 NB can also increase latency which might not be good.

yes, I agree.

 NB The thing to would be to put some tracing in to find out exactly what
 NB is happening for some sample workloads, and then see if anything can
 NB be improved.

well, the simplest case I tried was this:

mdadm -C /dev/md0 --level=5 --chunk=8 --raid-disks=3 ...
then open /dev/md0 with O_DIRECT and send a write of 16K.
it ended up, doing few writes and one read. the sequence was:
1) serving first 4K of the request - put the stripe it onto delayed list
2) serving 2nd 4KB -- again onto delayed list
3) serving 3rd 4KB -- get a full uptodate stripe, time to make the parity
   3 writes are issued for stripe #0
4) raid5_unplug_device() is called because of those 3 writes
   it activates delayed stripe #4
5) raid5d() finds stripe #4 and issues READ
...

I tend to think this isn't the most optimal way. couldn't we take current
request into account somehow. something like keep delayed off the queue
until current requests aren't served AND stripe cache isn't full.

another similar case is when you have two processes writing to very
different stripes and low-level requests they make from handle_stripe()
cause delayed stripes to get activated.

thanks, Alex
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm does not want to create partitions

2006-04-11 Thread Anton Petrusevich
Hi,

I need help with RAID-0 array. I would not ask it here if I were able to find 
information what to do in my case. Ok, the problem. Our datacenter had a 
power failure and after that I am trying to bring up all services at one our 
server that uses partitioned RAID-0 array for Oracle tablespaces. I was 
partiotioned and fdisk sees it: 
# fdisk -l /dev/md1

Disk /dev/md1: 366.5 GB, 366555955200 bytes
2 heads, 4 sectors/track, 89491200 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Device BootStart   EndBlocks   Id  System
/dev/md1p1 1  15869142  63476566   83  Linux
/dev/md1p2  15869143  31738284  63476568   83  Linux
/dev/md1p3  31738285  47607426  63476568   83  Linux
/dev/md1p4  47607427  89491200 1675350965  Extended
/dev/md1p5  47607427  63476568  63476566   83  Linux
/dev/md1p6  63476569  79345710  63476566   83  Linux
/dev/md1p7  79345711  80566414   4882814   83  Linux
/dev/md1p8  80566415  89491200  35699142   83  Linux

but, I have:

# mdadm -A /dev/md1 --auto=part8
mdadm: that --auto option not compatable with device named /dev/md1

It can be assembled only as non-partitioned, --auto=part or mdp do not work 
also. I need my partitions back, but I don't understand what I can do. The 
kernel is from RHEL-4 2.6.9-22.EL.roothugemem (custom build for RedHat 9). It 
was working before the power failure. 
-- 
Anton Petrusevich
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: md/mdadm fails to properly run on 2.6.15 after upgrading from 2.6.11

2006-04-11 Thread dean gaudet


On Mon, 10 Apr 2006, Marc L. de Bruin wrote:

 dean gaudet wrote:
  On Mon, 10 Apr 2006, Marc L. de Bruin wrote:
  
   However, all preferred minors are correct, meaning that the output is in
   sync with what I expected it to be from /etc/mdadm/mdadm.conf.
   
   Any other ideas? Just adding /etc/mdadm/mdadm.conf to the initrd does not
   seem
   to work, since mdrun seems to ignore it?!
 
  it seems to me mdrun /dev is about the worst thing possible to use in an
  initrd.
 
 :-)
 
 I guess I'll have to change to yaird asap then. I can't think of any other
 solid solution...

yeah i've been yaird... it's not perfect -- take a look at 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=351183 for a patch i 
use to improve the ability of a yaird initrd booting when you've moved 
devices or a device has failed.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


forcing a read on a known bad block

2006-04-11 Thread dean gaudet
hey Neil...

i've been wanting to test out the reconstruct-on-read-error code... and 
i've had two chances to do so, but haven't be able to force md to read the 
appropriate block to trigger the code.

i had two disks with SMART Current_Pending_Sector  0 (which indicates 
pending read error) and i did SMART long self-tests to find out where the 
bad block was (it should show the LBA in the SMART error log)...

one disk was in a raid1 -- and so it was kind of random which of the two 
disks would be read from if i tried to seek to that LBA and read... in 
theory with O_DIRECT i should have been able to randomly get the right 
disk, but that seems a bit clunky.  unfortunately i didn't think of the 
O_DIRECT trick until after i'd given up and decided to just resync the 
whole disk proactively.

the other disk was in a raid5 ... 5 disk raid5, so 20% chance of the bad 
block being in parity.  i copied the kernel code to be sure, and sure 
enough the bad block was in parity... just bad luck :)  so i can't force a 
read there any way that i know of...

anyhow this made me wonder if there's some other existing trick to force 
such reads/reconstructions to occur... or perhaps this might be a useful 
future feature.

on the raid5 disk i actually tried reading the LBA directly from the 
component device and it didn't trigger the read error, so now i'm a bit 
skeptical of the SMART log and/or my computation of the seek offset in the 
partition... but the above question is still interesting.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm + raid1 of 2 disks and now need to add more

2006-04-11 Thread Ming Zhang
On Tue, 2006-04-11 at 20:32 +, Andy Smith wrote:
 On Tue, Apr 11, 2006 at 07:25:58PM +0200, Laurent CARON wrote:
  Andy Smith wrote:
  On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote:
  I have two SCSI disks on raid1.
  Since I have lots of reads from that raid, I want to add two more
  disks to this raid so that read will be faster.
  
  How should I add the new disks?
  
  Is this possible with md currently:
  
  Create a RAID-10 on the two new disks specifying one disk missing
  from each mirror.
  
  Then copy data over and add the two existing disks letting it
  resync?
  
  Why not growing the array with 2 more disks?
 
 Well I guess a RAID-1 of 4 disks would be slightly more redundant
 than a 4 disk RAID-10, but it would have half the capacity, and the
 read performance would be very similar, no?

raid1 of 4 will give u read performance like 1 disk;
raid10 of 4 can give u read performance like aggregated 2 disks.


 

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forcing a read on a known bad block

2006-04-11 Thread Patrik Jonsson
hi dean,

dean gaudet wrote:
 
 the other disk was in a raid5 ... 5 disk raid5, so 20% chance of the bad 
 block being in parity.  i copied the kernel code to be sure, and sure 
 enough the bad block was in parity... just bad luck :)  so i can't force a 
 read there any way that i know of...

well, for raid5 you can use 'echo repair  /sys/block/mdx/md/sync_action'

This does a 'simulated reconstruction' and has triggered this for me in
the past. (For some reason 'check' instead of 'repair' did not, even
though it should have tried to read all the blocks then, too...)

That said, I have one disk in an 8-disk raid5 that says 'current pending
sector 1', and another that says 'offline uncorrectable 1', and they
have been doing so for months. Neither SMART extended tests or full
raid5 resyncs have either failed or fixed this, so I don't know what's
up with that...

cheers,

/Patrik


signature.asc
Description: OpenPGP digital signature


Re: [RAID] forcing a read on a known bad block

2006-04-11 Thread Julian Cowley

On Tue, 11 Apr 2006, dean gaudet wrote:

anyhow this made me wonder if there's some other existing trick to force
such reads/reconstructions to occur... or perhaps this might be a useful
future feature.


For testing RAID, what would be really nice is if there were a virtual
disk device where one could simulate bad sectors (read or write),
non-responsive disks, etc.  It would be virtual in the same sort way
that /dev/full simulates a full disk.

This would be an ideal project for Xen.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: mdadm + raid1 of 2 disks and now need to add more

2006-04-11 Thread Guy


} -Original Message-
} From: [EMAIL PROTECTED] [mailto:linux-raid-
} [EMAIL PROTECTED] On Behalf Of Ming Zhang
} Sent: Tuesday, April 11, 2006 6:13 PM
} To: Andy Smith
} Cc: linux-raid@vger.kernel.org
} Subject: Re: mdadm + raid1 of 2 disks and now need to add more
} 
} On Tue, 2006-04-11 at 20:32 +, Andy Smith wrote:
}  On Tue, Apr 11, 2006 at 07:25:58PM +0200, Laurent CARON wrote:
}   Andy Smith wrote:
}   On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote:
}   I have two SCSI disks on raid1.
}   Since I have lots of reads from that raid, I want to add two more
}   disks to this raid so that read will be faster.
}   
}   How should I add the new disks?
}   
}   Is this possible with md currently:
}   
}   Create a RAID-10 on the two new disks specifying one disk missing
}   from each mirror.
}   
}   Then copy data over and add the two existing disks letting it
}   resync?
}  
}   Why not growing the array with 2 more disks?
} 
}  Well I guess a RAID-1 of 4 disks would be slightly more redundant
}  than a 4 disk RAID-10, but it would have half the capacity, and the
}  read performance would be very similar, no?
} 
} raid1 of 4 will give u read performance like 1 disk;
} raid10 of 4 can give u read performance like aggregated 2 disks.

I know RAID1 of 4 disks will give you read performance like 4 disks.
Unless your test or application is single threaded!  Not likely I hope!
At least with the 2.4.31 kernel.

A RAID1 of 4 disks will still function with any 3 failed disks.
Real good idea for remote systems.

Also I think raid10 of 4 can give u read performance like 2 to 4 disks.
This depends more on the application.  IMO.
You would have twice the space, but maybe not needed.


} 
} 
} 
} 
} -
} To unsubscribe from this list: send the line unsubscribe linux-raid in
} the body of a message to [EMAIL PROTECTED]
} More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: accessing mirrired lvm on shared storage

2006-04-11 Thread Neil Brown
On Friday April 7, [EMAIL PROTECTED] wrote:
 Unfortunately md lacks the ability to mark an array as
 used/busy/you_name_it. Sometime ago I asked on this list for such an
 enhancement (see thread with subject Question: array locking,
 possible). Although I managed (with great help from few people on 
 this list) to attract Neil's attention, I couldn't fine enough
 arguments to convince him to put this topic on hist TO-DO list.
 Neil, you see the constantly growing number of potential users of this
 feature? ;-)

I don't think that just marking an array don't mount is really a
useful solution.  And if it was, it would be something done in 'mdadm'
rather than in 'md'.

What you really want is cluster wide locking using DLM or similar.
That way when the node which has active use of the array fails,
another node can pick up automatically.
Then we could put a flag in the superblock which says 'shared', and md
would need a special request to assemble such an array.

One thing that is on my todo list is supporting shared raid1, so that
several nodes in the cluster can assemble the same raid1 and access it
- providing that the clients all do proper mutual exclusion as
e.g. OCFS does.

Your desire to have only-assembled-once would be trivial to include in
that.

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RAID] forcing a read on a known bad block

2006-04-11 Thread Luca Berra

On Tue, Apr 11, 2006 at 12:37:53PM -1000, Julian Cowley wrote:

On Tue, 11 Apr 2006, dean gaudet wrote:

anyhow this made me wonder if there's some other existing trick to force
such reads/reconstructions to occur... or perhaps this might be a useful
future feature.


For testing RAID, what would be really nice is if there were a virtual
disk device where one could simulate bad sectors (read or write),
non-responsive disks, etc.  It would be virtual in the same sort way
that /dev/full simulates a full disk.


either use the MD faulty personality, or the device-mapper error
target.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html