Re: stopped array, but /sys/block/mdN still exists.

2008-01-03 Thread Neil Brown
On Thursday January 3, [EMAIL PROTECTED] wrote:
 
 So what happens if I try to _use_ that /sys entry? For instance run a 
 script which reads data, or sets the stripe_cache_size higher, or 
 whatever? Do I get back status, ignored, or system issues?

Try it:-)

The stripe_cache_size attributes will disappear (it is easy to remove
attributes, and stripe_cache_size is only meaningful for certain raid
levels).
Other attributes will return 0 or some equivalent, though I think
chunk_size will have the old value.

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: RAID5 reshape data corruption

2008-01-03 Thread Neil Brown
On Monday December 31, [EMAIL PROTECTED] wrote:
 Ok, since my previous thread didn't seem to attract much attention,
 let me try again.

Thank you for your report and your patience.

 An interrupted RAID5 reshape will cause the md device in question to
 contain one corrupt chunk per stripe if resumed in the wrong manner.
 A testcase can be found at http://www.nagilum.de/md/ .
 The first testcase can be initialized with start.sh the real test
 can then be run with test.sh. The first testcase also uses dm-crypt
 and xfs to show the corruption.

It looks like this can be fixed with the patch:

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2008-01-04 09:20:54.0 +1100
+++ ./drivers/md/raid5.c2008-01-04 09:21:05.0 +1100
@@ -2865,7 +2865,7 @@ static void handle_stripe5(struct stripe
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0  s.req_compute == 0)
handle_stripe_expansion(conf, sh, NULL);
 
if (sh-ops.count)


With this patch in place, the v2 test only reports errors after the end
of the original array, as you would expect (the new space is
initialised to 0).

 I'm not just interested in a simple behaviour fix I'm also interested
 in what actually happens and if possible a repair program for that
 kind of data corruption.

What happens is that when reshape happens while a device is missing,
the data on that device should be computed from the other data devices
and parity.  However because of the above bug, the data is copied into
the new layout before the compute is complete.  This means that the
data that was on that device is really lost beyond recovery.

I'm really sorry about that, but there is nothing that can be done to
recover the lost data.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread NeilBrown
This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and 24-rc.
It would be great if it cold get into 23.13 and 24.final.
Thanks.
NeilBrown

### Comments for Changeset

We currently do not wait for the block from the missing device
to be computed from parity before copying data to the new stripe
layout.

The change in the raid6 code is not techincally needed as we
don't delay data block recovery in the same way for raid6 yet.
But making the change now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: [EMAIL PROTECTED]
Cc: Dan Williams [EMAIL PROTECTED]
Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid5.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c2008-01-04 09:42:05.0 +1100
+++ ./drivers/md/raid5.c2008-01-04 09:42:27.0 +1100
@@ -2865,7 +2865,7 @@ static void handle_stripe5(struct stripe
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0  s.req_compute == 0)
handle_stripe_expansion(conf, sh, NULL);
 
if (sh-ops.count)
@@ -3067,7 +3067,7 @@ static void handle_stripe6(struct stripe
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0  s.req_compute == 0)
handle_stripe_expansion(conf, sh, r6s);
 
spin_unlock(sh-lock);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Dan Williams
On Thu, 2008-01-03 at 15:46 -0700, NeilBrown wrote:
 This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and 24-rc.
 It would be great if it cold get into 23.13 and 24.final.
 Thanks.
 NeilBrown
 
 ### Comments for Changeset
 
 We currently do not wait for the block from the missing device
 to be computed from parity before copying data to the new stripe
 layout.
 
 The change in the raid6 code is not techincally needed as we
 don't delay data block recovery in the same way for raid6 yet.
 But making the change now is safer long-term.
 
 This bug exists in 2.6.23 and 2.6.24-rc
 
 Cc: [EMAIL PROTECTED]
 Cc: Dan Williams [EMAIL PROTECTED]
 Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
Acked-by: Dan Williams [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Dan Williams
On Thu, 2008-01-03 at 16:00 -0700, Williams, Dan J wrote:
 On Thu, 2008-01-03 at 15:46 -0700, NeilBrown wrote:
  This patch fixes a fairly serious bug in md/raid5 in 2.6.23 and
 24-rc.
  It would be great if it cold get into 23.13 and 24.final.
  Thanks.
  NeilBrown
 
  ### Comments for Changeset
 
  We currently do not wait for the block from the missing device
  to be computed from parity before copying data to the new stripe
  layout.
 
  The change in the raid6 code is not techincally needed as we
  don't delay data block recovery in the same way for raid6 yet.
  But making the change now is safer long-term.
 
  This bug exists in 2.6.23 and 2.6.24-rc
 
  Cc: [EMAIL PROTECTED]
  Cc: Dan Williams [EMAIL PROTECTED]
  Signed-off-by: Neil Brown [EMAIL PROTECTED]
 
 Acked-by: Dan Williams [EMAIL PROTECTED]
 

On closer look the safer test is:

!test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending).

The 'req_compute' field only indicates that a 'compute_block' operation
was requested during this pass through handle_stripe so that we can
issue a linked chain of asynchronous operations.

---

From: Neil Brown [EMAIL PROTECTED]

md: Fix data corruption when a degraded raid5 array is reshaped.

We currently do not wait for the block from the missing device
to be computed from parity before copying data to the new stripe
layout.

The change in the raid6 code is not techincally needed as we
don't delay data block recovery in the same way for raid6 yet.
But making the change now is safer long-term.

This bug exists in 2.6.23 and 2.6.24-rc

Cc: [EMAIL PROTECTED]
Signed-off-by: Dan Williams [EMAIL PROTECTED]
---

 drivers/md/raid5.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a5aad8c..e8c8157 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2865,7 +2865,8 @@ static void handle_stripe5(struct stripe_head *sh)
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0 
+   !test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending))
handle_stripe_expansion(conf, sh, NULL);
 
if (sh-ops.count)
@@ -3067,7 +3068,8 @@ static void handle_stripe6(struct stripe_head *sh, struct 
page *tmp_page)
md_done_sync(conf-mddev, STRIPE_SECTORS, 1);
}
 
-   if (s.expanding  s.locked == 0)
+   if (s.expanding  s.locked == 0 
+   !test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending))
handle_stripe_expansion(conf, sh, r6s);
 
spin_unlock(sh-lock);

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


md raid1 in active-active arrangement

2008-01-03 Thread Clayton Bell

Is it reasonable for a raid1 mirror to be momentarily active on two servers at 
the same time?

This question has come about from Xen virtual host live migrations.


Consider the case with shared storage between two servers:

Server A: /dev/md0 is raid1 to devices /dev/mapper/wwidX and /dev/mapper/wwidY
Server B: /dev/md0 is raid1 to devices /dev/mapper/wwidX and /dev/mapper/wwidY

/dev/md0 is assigned to the Xen virtual host (ie it becomes /dev/sda within the 
virtual host)


When live migrating the virtual host from Server A to Server B, /dev/md0 must 
be active on both server A and B at the same time, at least momentarily.

Is md going to cope with this kind of setup?  Under what conditions will it 
fail dismally?

To me it would seem as though the migration can only occur when:
1. the mirrors are in sync
2. Server A and B use their own/separate external bitmap file for /dev/md0

Even if the time duration that both /dev/md0's are active is minimized to less 
than a second, what possible data corruption could occur?


Feedback much appreciated.  I hope the linux-raid group finds this situation 
least a little interesting.

Thank you

Clayton

This email and any attachments may contain privileged and confidential 
information and are intended for the named addressee only. If you have received 
this e-mail in error, please notify the sender and delete this e-mail 
immediately. Any confidentiality, privilege or copyright is not waived or lost 
because this e-mail has been sent to you in error. It is your responsibility to 
check this e-mail and any attachments for viruses.  No warranty is made that 
this material is free from computer virus or any other defect or error.  Any 
loss/damage incurred by using this material is not the sender's responsibility. 
 The sender's entire liability will be limited to resupplying the material.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] md: Fix data corruption when a degraded raid5 array is reshaped.

2008-01-03 Thread Neil Brown
On Thursday January 3, [EMAIL PROTECTED] wrote:
 
 On closer look the safer test is:
 
   !test_bit(STRIPE_OP_COMPUTE_BLK, sh-ops.pending).
 
 The 'req_compute' field only indicates that a 'compute_block' operation
 was requested during this pass through handle_stripe so that we can
 issue a linked chain of asynchronous operations.
 
 ---
 
 From: Neil Brown [EMAIL PROTECTED]

Technically that should probably be
  From: Dan Williams [EMAIL PROTECTED]

now, and then I add
  Acked-by: NeilBrown [EMAIL PROTECTED]

because I completely agree with your improvement.

We should keep an eye out for then Andrew commits this and make sure
the right patch goes in...

Thanks,
NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html