Re: RAID1 and data safety?

2005-03-22 Thread Molle Bestefich
Neil Brown wrote:
 Is there any way to tell MD to do verify-on-write and
 read-from-all-disks on a RAID1 array?

 No.
 I would have thought that modern disk drives did some sort of
 verify-on-write, else how would they detect write errors, and they are
 certainly in the best place to do verify-on-write.

Really?  My guess was that they wouldn't, because it would lead to
less performance.
And that's why read errors crop up at read time.

 Doing it at the md level would be problematic as you would have to
 ensure that you really were reading from the media and not from some
 cache somewhere in the data path.  I doubt it would be a mechanism
 that would actually increase confidence in the safety of the data.

Hmm.  Could hack it by reading / writing blocks larger than the cache.  Ugly.

 Imagine a filesystem that could access multiple devices, and where it
 kept index information it didn't just keep one block address, but
 rather kept two block address, each on different devices, and a strong
 checksum of the data block.  This would allow much the same robustness
 as read-from-all-drives and much lower overhead.

As in, if the checksum fails, try loading the data blocks [again]
from the other device?
Not sure why a checksum of X data blocks should be cheaper
performance-wise than a comparison between X data blocks, but I can
see the point in that you only have to load the data once and check
the checksum.  Not quite the same security, but almost.

 In summary:
  - you cannot do it now.
  - I don't think md is at the right level to solve these sort of problems.
I think a filesystem could do it much better. (I'm working on a
filesystem  slowly...)
  - read-from-all-disks might get implemented one day. verify-on-write
is much less likely.
 
 Apologies if the answer is in the docs.
 
 It isn't.  But it is in the list archives now

Thanks! :-)

(Guess I'll drop the idea for the time being...)
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Dell + Adaptec SATA RAID controller...

2005-03-22 Thread Gordon Henderson

As part of a (Dell) server purchase, a client was given a free Dell 750
PowerEdge (Celeron) box with 2 x 120GB SATA drives... Opening the lid (as
you do :)  revealed that the motherboard has on-board SATA, but Dell had
also plugged in an Adaptec 6-port SATA RAID card, and connected the 2
drives to that.

Now I'm wondering why Dell have the capacity to give away free servers (I
know someone else who got a free server out of them a while back) and why
they'd put in a (presumably) expensive RAID controller...

Maybe the mobo controllers are knackererd in some way?

Anyway, I'm tempted to just remove the Adaptec card and give the on-board
controllers a go using s/w RAID which I know and love...

Anyone got any comments either way?

Cheers,

Gordon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Luca Berra
On Mon, Mar 21, 2005 at 02:58:56PM -0500, Paul Clements wrote:
Luca Berra wrote:
On Mon, Mar 21, 2005 at 11:07:06AM -0500, Paul Clements wrote:
All I'm saying is that in a split-brain scenario, typical cluster 
frameworks will make two (or more) systems active at the same time. This 
I sincerely hope not.
Perhaps my choice of wording was not the best? I probably should have 
said, there is no foolproof way to guarantee that two systems are not 
active. Software fails, human beings make mistakes, and surely even 
STONITH devices can be misconfigured or can fail (or cannot be used for 
one reason or another).
well, careful use of an arbitrator node, possibly in a different
location, helps avoiding split-brains, and stonith is a requirement
At any rate, this is all irrelevant given the second part of that email 
reply that I gave. You still have to do the bitmap combining, regardless 
of whether two systems were active at the same time or not.
I still maintain that doing data-replication with md over nbd is a
painful and not very useful exercise.
If we want to do data-replication, access to the data-replicated device
should be controlled by the data replication process (*), md does not
guarantee this.
(*) i.e. my requirements could be that having a replicated transaction
is more important that completing the transaction itself, so i might
want to return a disk error in case replica fails.
or to the contrary i might want data availability above all else, maybe
data does not change much.
or something in between, data availability above replication, but
data validity over availability. this is probably the most common
scenario, and the more difficult to implement correctly.
In any case it must be possible to control exactly which steps should be
automatically done in case of failure. and in case of rollback, with the
sane default would be die rather than modify any data, in case of
doubt.
L.
--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Peter T. Breuer
Luca Berra [EMAIL PROTECTED] wrote:
 If we want to do data-replication, access to the data-replicated device
 should be controlled by the data replication process (*), md does not
 guarantee this.

Well, if one writes to the md device, then md does guarantee this - but
I find it hard to parse the statement. Can you elaborate a little in
order to reduce my possible confusion?


 (*) i.e. my requirements could be that having a replicated transaction
 is more important that completing the transaction itself, so i might
 want to return a disk error in case replica fails.

Oh - I see. We did half off all the replications possible. That's an
interesting idea and it is trivial to modify md to return error if not
all the replications succeeded. The bitmap knows right now. No reason
not to call end_io(...,0) instead of end_io(...,1) if you want it that
way.

 or to the contrary i might want data availability above all else, maybe
 data does not change much.
 or something in between, data availability above replication, but
 data validity over availability. this is probably the most common
 scenario, and the more difficult to implement correctly.
 
 In any case it must be possible to control exactly which steps should be
 automatically done in case of failure. and in case of rollback, with the
 sane default would be die rather than modify any data, in case of
 doubt.

Well, if you want to be more exact about it, I am sure your wishes can
be accomodated. It's not a bad idea to be able to tailor the policy.

Peter

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strangeness when booting raid1: md5 already running?

2005-03-22 Thread Ruth Ivimey-Cook
On Tue, 22 Mar 2005, Neil Brown wrote:
On Monday March 21, [EMAIL PROTECTED] wrote:
...repeated several times
md: export_rdev(hda9)
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
EXT3-fs: recovery complete.
Now this was the first time the string md5 appears in the log. And
indeed, it appears that hda9 has been kicked out of the array:
So was md5 actually running (what did /proc/mdstat show? What about
mdadm -D /dev/md5?).
mdstat and mdadm both report that md5 is running degraded - with hdc9 and hda9 
removed.

which wasn't part of an array but couldn't be added to one.  Nothing
particularly interesting.
Ok. I'll just re-add the drive and see what happens.
Thanks
Ruth
--
Ruth Ivimey-Cook
Software engineer and technical writer.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AW: RAID1 and data safety?

2005-03-22 Thread Schuett Thomas EXT
Neil Brown wrote:
 Is there any way to tell MD to [...] and
 read-from-all-disks on a RAID1 array?

Not sure why a checksum of X data blocks should be cheaper
performance-wise than a comparison between X data blocks, but I can
see the point in that you only have to load the data once and check
the checksum.  Not quite the same security, but almost.

Still, if there is different data on the two disks due to a previous 
power failure, the comparsion could really be the better choise, isn't it?


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Questions regarding readonly/readwrite semantics

2005-03-22 Thread Mario Holbe
Hello,

in the beginning I had just one simple question :) ...
Is there any way to start RAIDs in readonly mode while autodetection
on system boot?

I was thinking about some 2-stage boot mimic that first enables all
RAIDs readonly (while autodetection for example) and then in a second
stage sets them to readwrite (in a early init script for example).
Such a mimic could provide a chance to system administrators for
intervention before automatic resync or such things would take place
(by booting up in emergency mode for example).

Since I found no solution to tell the autodetection to start up RAIDs
in readonly mode, I was thinking about kicking autodetection at all
and starting up RAIDs via mdadm or something like that. However, when
looking at man mdadm it seems to me that it is impossible there too,
to start RAIDs in readonly mode (since --readonly and --readwrite are
defined in Misc mode only, while I would need them in Assemble mode,
wouldn't I?). So my second question came up :) ...
Is there any way to start RAIDs in readonly mode at all?

And then while looking at --readonly and --readwrite semantics, some
more questions came up :) ...

I was trying in emergency mode with two RAID1: md0 and md4.
md0: initially readwrite mounted / ro
md4: initially readwrite not mounted
I'm running 2.4.27 built from Debian's kernel-source-2.4.27.
I'm booting with ro in kernel commandline, so root device is mounted
readonly initially.

# mdadm --readonly /dev/md0
failed to set readonly: EBUSY

This is somehow understandable. However, it would be nice to have a
way to force it. Furthermore, since the device is (should be?) opened
readonly, it should be possible to set it readonly, too.

# mdadm --readwrite /dev/md0
failed to set writable: EBUSY

Huh, why does that fail? It *is* writable already!

# mdadm --readonly /dev/md4

Works. Of course.

# mount -o ro /dev/md4 /usr
# mdadm --readwrite /dev/md4

Works. Why does it work? If setting an already readwrite device to
readwrite fails, *this* one should fail more than ever!

# mdadm --readonly /dev/md4
failed to set readonly: EBUSY

Expected. However, since this device *must* be opened readonly (since
it *was* readonly at mount time), it should definitely be possible to
set it back to readonly.

Well, the whole readonly/readwrite semantics seem somehow inconsistent
to me. Setting a mounted and already readwrite device to readwrite
fails, while setting a mounted but readonly device to readwrite works.

And a last question came up then, too:

blockdev --setro /dev/md0
md: blockdev(pid 3446) used obsolete MD ioctl, upgrade your software to use new 
ictls.
BLKROSET: Invalid argument

Is there any objective, why md does not support the standard block
device readonly/readwrite ioctls?


regards
   Mario
-- 
Independence Day: Fortunately, the alien computer operating system works just
fine with the laptop. This proves an important point which Apple enthusiasts
have known for years. While the evil empire of Microsoft may dominate the
computers of Earth people, more advanced life forms clearly prefer Mac's.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] md bitmap bug fixes

2005-03-22 Thread Peter T. Breuer
Paul Clements [EMAIL PROTECTED] wrote:
  system A
  [raid1]
  / \
 [disk][nbd] -- system B
 
 2) you're writing, say, block 10 to the raid1 when A crashes (block 10 
 is dirty in the bitmap, and you don't know whether it got written to the 
 disk on A or B, neither, or both)

Let me offer an example based on this scenario.  Block 10 is sent to
both, and B's bitmap is dirtied for it, but the data itself never
arrives.  At the same time block 10 is sent to A, and the bitmap is
dirtied for it, the data sent, and (miraculously) the bitmap on A is
cleared for the received data (I don't now why or how - nobody has yet
specified the algorithm with enough precision for me to say).

At this point B's bitmap is dirty for block 10, and A's is not. A has
received the data for block 10, and B has not.

 3) something (i.e., your cluster framework) notices that A is gone and 
 brings up a new raid1, with an empty bitmap, on system B:

Now, this looks wrong, because to sync A from B we will later need to
copy block 10 from B to A in order to undo the extra write already
done on A, and A's bitmap is not marked dirty for block 10, only B's is,
so we cannot zero B's bitmap because that would lose the information
about block 10.

 --

I've been thinking about this in more general terms, and it seems to me
that the algorithms offered (and I say I have not seen enough
detail to be sure) may be in general insufficiently pessimistic.

That is, they may clear the bitmap too soon (as in the thought
experiment above).  Or they may not dirty the bitmaps soon enough.

I believe that you are aiming for algorithms in which the _combined_
bitmaps are sufficiently pessimistic, but the individual bitmaps
are not necesarily so.

But it appears to me as though it may not be much trouble to ensure that
_each_ bitmap is sufficiently pessimistic on its own with respect to
clearing.  Just clear _each_ bitmap only when _both_ writes have been
done.

 --

Can this plan fail to be pessimistic enough with respect to dirtying
the bitmaps in the first place?

What if block 10 is sent to A, which is to say the bitmap on A is
dirtied, and the data sent, and received on A. Can B _not_ have its
bitmap dirtied for block 10?

Well, yes, if A dies before sending out the bitmap dirty to B, but after
sending out the bitmap dirty AND the data to A.  That's normally not
possible. We normally surely send out all bitmap dirties before sending
out any data. But can we wait for these to complete before starting on
the data writes? If B times out, we will have to go ahead and dirty A's
bitmap on its own and thereafter always dirty and never clear it.

So this corresponds to A continuing to work after losing contact with B.

Now, if A dies after that, and for some reason we start using B, then B
will need eventually to have its block 10 sent to A when we resync A
from B.

But we never should have switched to B in the first place! B was
expelled from the array.  But A maybe died before saying so to anyone.

Well, plainly A should not have gone on to write anything in the array
after expelling B until it was able to write in its (A's) superblock
that B had been expelled.

Then, later, on recovery with a sync from B to A (even though it is the
wrong direction), A will either say in its sb that B has not been
expelled AND contain no extra writes t be undone from B, or A will say
that B has been expelled, and its bitmap will say which writes have
been done that were not done on B, and we can happily decide to sync
from B, or sync from A.

So it looks like there are indeed several admin foul-ups and crossed
wires which could give us reason to sync in the rong direction, and
then we will want to know what the recipient has in its bitmap. But
we will be able to see that that is the situuation.

In all other cases, it is sufficient to know just the bitmap on the
master.

The particular dubious situation outlined here is

1) A loses contact with B and continues working without B in the array,
  so B is out of date.

2) A dies, and B is recovered, becoming used as the master.

3) When A is recovered, we choose to sync A from B, not B from A.

In that case we need to look at bitmaps both sides. But note that one
bitmap per array (on the local side) would suffice in this case. The
array node location shifts during the process outlined, givig two
bitmaps to make use of eventually.


Peter

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md ] md: allow degraded raid1 array to resync after an unclean shutdown.

2005-03-22 Thread NeilBrown
The following is (I think) appropriate for 2.4.30.  The bug it fixes
can result in data corruption in a fairly unusual circumstance (having
a 3 drive raid1 array running in degraded mode, and suffering a system
crash). 

### Comments for Changeset

If a raid1 array has more than two devices, and not all are working,
then it will not resync after an unclean shutdown (as it will think
that it should reconstruct a failed drive, and will find there aren't
any spares...)

This patch fixes the problem.

Problem found by Mario Holbe [EMAIL PROTECTED] (thanks!)

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/raid1.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~   2005-03-23 11:28:56.0 +1100
+++ ./drivers/md/raid1.c2005-03-23 11:38:41.0 +1100
@@ -891,6 +891,8 @@ static int raid1_diskop(mddev_t *mddev, 
mdp_disk_t *failed_desc, *spare_desc, *added_desc;
mdk_rdev_t *spare_rdev, *failed_rdev;
 
+   if (conf-resync_mirrors)
+   return 1; /* Cannot do any diskops during a resync */
 
switch (state) {
case DISKOP_SPARE_ACTIVE:
@@ -1333,6 +1335,8 @@ static void raid1syncd (void *data)
 
up(mddev-recovery_sem);
raid1_shrink_buffers(conf);
+
+   md_recover_arrays(); /* incase we are degraded and a spare is available 
*/
 }
 
 /*
@@ -1741,10 +1745,6 @@ static int raid1_run (mddev_t *mddev)
conf-last_used = j;
 
 
-   if (conf-working_disks != sb-raid_disks) {
-   printk(KERN_ALERT raid1: md%d, not all disks are operational 
-- trying to recover array\n, mdidx(mddev));
-   start_recovery = 1;
-   }
 
{
const char * name = raid1d;
@@ -1756,7 +1756,7 @@ static int raid1_run (mddev_t *mddev)
}
}
 
-   if (!start_recovery  !(sb-state  (1  MD_SB_CLEAN)) 
+   if (!(sb-state  (1  MD_SB_CLEAN)) 
(conf-working_disks  1)) {
const char * name = raid1syncd;
 
@@ -1769,6 +1769,9 @@ static int raid1_run (mddev_t *mddev)
printk(START_RESYNC, mdidx(mddev));
conf-resync_mirrors = 1;
md_wakeup_thread(conf-resync_thread);
+   } else if (conf-working_disks != sb-raid_disks) {
+   printk(KERN_ALERT raid1: md%d, not all disks are operational 
-- trying to recover array\n, mdidx(mddev));
+   start_recovery = 1;
}
 
/*
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 4 of 12] Minor code rearrangement in bitmap_init_from_disk

2005-03-22 Thread NeilBrown


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~  2005-03-22 17:13:16.0 +1100
+++ ./drivers/md/bitmap.c   2005-03-22 17:19:19.0 +1100
@@ -782,7 +782,9 @@ static int bitmap_init_from_disk(struct 
recovery\n, bmname(bitmap));
 
bytes = (chunks + 7) / 8;
-   num_pages = (bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+   num_pages = (bytes + sizeof(bitmap_super_t) + PAGE_SIZE - 1) / 
PAGE_SIZE + 1;
+
if (i_size_read(file-f_mapping-host)  bytes + 
sizeof(bitmap_super_t)) {
printk(KERN_INFO %s: bitmap file too short %lu  %lu\n,
bmname(bitmap),
@@ -790,18 +792,16 @@ static int bitmap_init_from_disk(struct 
bytes + sizeof(bitmap_super_t));
goto out;
}
-   num_pages++;
+
+   ret = -ENOMEM;
+
bitmap-filemap = kmalloc(sizeof(struct page *) * num_pages, 
GFP_KERNEL);
-   if (!bitmap-filemap) {
-   ret = -ENOMEM;
+   if (!bitmap-filemap)
goto out;
-   }
 
bitmap-filemap_attr = kmalloc(sizeof(long) * num_pages, GFP_KERNEL);
-   if (!bitmap-filemap_attr) {
-   ret = -ENOMEM;
+   if (!bitmap-filemap_attr)
goto out;
-   }
 
memset(bitmap-filemap_attr, 0, sizeof(long) * num_pages);
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 2 of 12] Enable the bitmap write-back daemon and wait for it.

2005-03-22 Thread NeilBrown

Currently we don't wait for updates to the bitmap to be
flushed to disk properly.  The infrastructure all there, 
but it isn't being used

A separate kernel thread (bitmap_writeback_daemon) is needed to
wait for each page as we cannot get callbacks when a page write
completes.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |  119 ++
 ./include/linux/raid/bitmap.h |   13 
 2 files changed, 55 insertions(+), 77 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~  2005-03-22 17:11:04.0 +1100
+++ ./drivers/md/bitmap.c   2005-03-22 17:12:09.0 +1100
@@ -261,30 +261,33 @@ char *file_path(struct file *file, char 
 /*
  * write out a page
  */
-static int write_page(struct page *page, int wait)
+static int write_page(struct bitmap *bitmap, struct page *page, int wait)
 {
int ret = -ENOMEM;
 
lock_page(page);
 
-   if (page-mapping == NULL)
-   goto unlock_out;
-   else if (i_size_read(page-mapping-host)  page-index  PAGE_SHIFT) {
-   ret = -ENOENT;
-   goto unlock_out;
-   }
-
ret = page-mapping-a_ops-prepare_write(NULL, page, 0, PAGE_SIZE);
if (!ret)
ret = page-mapping-a_ops-commit_write(NULL, page, 0,
PAGE_SIZE);
if (ret) {
-unlock_out:
unlock_page(page);
return ret;
}
 
set_page_dirty(page); /* force it to be written out */
+
+   if (!wait) {
+   /* add to list to be waited for by daemon */
+   struct page_list *item = mempool_alloc(bitmap-write_pool, 
GFP_NOIO);
+   item-page = page;
+   page_cache_get(page);
+   spin_lock(bitmap-write_lock);
+   list_add(item-list, bitmap-complete_pages);
+   spin_unlock(bitmap-write_lock);
+   md_wakeup_thread(bitmap-writeback_daemon);
+   }
return write_one_page(page, wait);
 }
 
@@ -343,14 +346,13 @@ int bitmap_update_sb(struct bitmap *bitm
spin_unlock_irqrestore(bitmap-lock, flags);
return 0;
}
-   page_cache_get(bitmap-sb_page);
spin_unlock_irqrestore(bitmap-lock, flags);
sb = (bitmap_super_t *)kmap(bitmap-sb_page);
sb-events = cpu_to_le64(bitmap-mddev-events);
if (!bitmap-mddev-degraded)
sb-events_cleared = cpu_to_le64(bitmap-mddev-events);
kunmap(bitmap-sb_page);
-   return write_page(bitmap-sb_page, 0);
+   return write_page(bitmap, bitmap-sb_page, 0);
 }
 
 /* print out the bitmap file superblock */
@@ -556,10 +558,10 @@ static void bitmap_file_unmap(struct bit
 static void bitmap_stop_daemons(struct bitmap *bitmap);
 
 /* dequeue the next item in a page list -- don't call from irq context */
-static struct page_list *dequeue_page(struct bitmap *bitmap,
-   struct list_head *head)
+static struct page_list *dequeue_page(struct bitmap *bitmap)
 {
struct page_list *item = NULL;
+   struct list_head *head = bitmap-complete_pages;
 
spin_lock(bitmap-write_lock);
if (list_empty(head))
@@ -573,23 +575,15 @@ out:
 
 static void drain_write_queues(struct bitmap *bitmap)
 {
-   struct list_head *queues[] = {  bitmap-complete_pages, NULL };
-   struct list_head *head;
struct page_list *item;
-   int i;
 
-   for (i = 0; queues[i]; i++) {
-   head = queues[i];
-   while ((item = dequeue_page(bitmap, head))) {
-   page_cache_release(item-page);
-   mempool_free(item, bitmap-write_pool);
-   }
+   while ((item = dequeue_page(bitmap))) {
+   /* don't bother to wait */
+   page_cache_release(item-page);
+   mempool_free(item, bitmap-write_pool);
}
 
-   spin_lock(bitmap-write_lock);
-   bitmap-writes_pending = 0; /* make sure waiters continue */
wake_up(bitmap-write_wait);
-   spin_unlock(bitmap-write_lock);
 }
 
 static void bitmap_file_put(struct bitmap *bitmap)
@@ -734,13 +728,13 @@ int bitmap_unplug(struct bitmap *bitmap)
spin_unlock_irqrestore(bitmap-lock, flags);
 
if (attr  (BITMAP_PAGE_DIRTY | BITMAP_PAGE_NEEDWRITE))
-   if (write_page(page, 0))
+   if (write_page(bitmap, page, 0))
return 1;
}
if (wait) { /* if any writes were performed, we need to wait on them */
spin_lock_irq(bitmap-write_lock);
wait_event_lock_irq(bitmap-write_wait,
-   bitmap-writes_pending == 0, bitmap-write_lock,
+   list_empty(bitmap-complete_pages), bitmap-write_lock,

[PATCH md 8 of 12] A couple of tidyups relating to the bitmap file.

2005-03-22 Thread NeilBrown

1/ When init from disk, it is a BUG if there is nowhere
   to init from,
2/ use seq_path to print path in /proc/mdstat


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |8 +---
 ./drivers/md/md.c |   11 +--
 2 files changed, 6 insertions(+), 13 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~  2005-03-22 17:20:50.0 +1100
+++ ./drivers/md/bitmap.c   2005-03-22 17:21:47.0 +1100
@@ -764,13 +764,7 @@ static int bitmap_init_from_disk(struct 
chunks = bitmap-chunks;
file = bitmap-file;
 
-   if (!file) { /* no file, dirty all the in-memory bits */
-   printk(KERN_INFO %s: no bitmap file, doing full recovery\n,
-   bmname(bitmap));
-   bitmap_set_memory_bits(bitmap, 0,
-  chunks  CHUNK_BLOCK_SHIFT(bitmap), 1);
-   return 0;
-   }
+   BUG_ON(!file);
 
 #if INJECT_FAULTS_3
outofdate = 1;

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~  2005-03-22 17:20:16.0 +1100
+++ ./drivers/md/md.c   2005-03-22 17:21:30.0 +1100
@@ -3259,10 +3259,8 @@ static int md_seq_show(struct seq_file *
seq_printf(seq, \n   );
 
if ((bitmap = mddev-bitmap)) {
-   char *buf, *path;
unsigned long chunk_kb;
unsigned long flags;
-   buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
spin_lock_irqsave(bitmap-lock, flags);
chunk_kb = bitmap-chunksize  10;
seq_printf(seq, bitmap: %lu/%lu pages [%luKB], 
@@ -3273,13 +3271,14 @@ static int md_seq_show(struct seq_file *
 (PAGE_SHIFT - 10),
chunk_kb ? chunk_kb : bitmap-chunksize,
chunk_kb ? KB : B);
-   if (bitmap-file  buf) {
-   path = file_path(bitmap-file, buf, PAGE_SIZE);
-   seq_printf(seq, , file: %s, path ? path : );
+   if (bitmap-file) {
+   seq_printf(seq, , file: );
+   seq_path(seq, bitmap-file-f_vfsmnt, 
+bitmap-file-f_dentry, \t\n);
}
+
seq_printf(seq, \n);
spin_unlock_irqrestore(bitmap-lock, flags);
-   kfree(buf);
}
 
seq_printf(seq, \n);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 0 of 12] Introduction

2005-03-22 Thread NeilBrown

Here are 12 patches for the bitmap write-intent logging in md in
2.6.12-rc1-mm1 With this, it is getting quite close to being really
usable (though there are a couple of issues that I haven't resolved
yet.

Andrew: Are you happy to keep collecting these as a list of patches
(bugs followed by bug-fixes :-), or would it be easier if I merged all
the bug fixes into earlier patches and just resent a small number of
add-functionality patches??

NeilBrown

[PATCH md 1 of 12] Check return value of write_page, rather than ignore it
[PATCH md 2 of 12] Enable the bitmap write-back daemon and wait for it.
[PATCH md 3 of 12] Improve debug-printing of bitmap superblock.
[PATCH md 4 of 12] Minor code rearrangement in bitmap_init_from_disk
[PATCH md 5 of 12] Print correct pid for newly created  bitmap-writeback-daemon.
[PATCH md 6 of 12] Call bitmap_daemon_work regularly
[PATCH md 7 of 12] Don't skip bitmap pages due to lack of bit that we just 
cleared.
[PATCH md 8 of 12] A couple of tidyups relating to the bitmap file.
[PATCH md 9 of 12] Make sure md bitmap is cleared on a clean start.
[PATCH md 10 of 12] Fix bug when raid1 attempts a partial reconstruct.
[PATCH md 11 of 12] Allow md to update multiple superblocks in parallel.
[PATCH md 12 of 12] Allow md intent bitmap to be stored near the superblock.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 5 of 12] Print correct pid for newly created bitmap-writeback-daemon.

2005-03-22 Thread NeilBrown

The debugging message printed the wrong pid, which didn't help
remove bugs

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~  2005-03-22 17:19:19.0 +1100
+++ ./drivers/md/bitmap.c   2005-03-22 17:20:08.0 +1100
@@ -1107,7 +1107,7 @@ static int bitmap_start_daemon(struct bi
md_wakeup_thread(daemon); /* start it running */
 
PRINTK(%s: %s daemon (pid %d) started...\n,
-   bmname(bitmap), name, bitmap-daemon-tsk-pid);
+   bmname(bitmap), name, daemon-tsk-pid);
 out_unlock:
spin_unlock_irqrestore(bitmap-lock, flags);
return 0;
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 3 of 12] Improve debug-printing of bitmap superblock.

2005-03-22 Thread NeilBrown

- report sync_size properly  - need /2 to convert sectors to KB
- move everything over 2 spaces to allow proper spelling of
  events cleared.


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~  2005-03-22 17:12:09.0 +1100
+++ ./drivers/md/bitmap.c   2005-03-22 17:13:16.0 +1100
@@ -364,22 +364,22 @@ void bitmap_print_sb(struct bitmap *bitm
return;
sb = (bitmap_super_t *)kmap(bitmap-sb_page);
printk(KERN_DEBUG %s: bitmap file superblock:\n, bmname(bitmap));
-   printk(KERN_DEBUGmagic: %08x\n, le32_to_cpu(sb-magic));
-   printk(KERN_DEBUG  version: %d\n, le32_to_cpu(sb-version));
-   printk(KERN_DEBUG uuid: %08x.%08x.%08x.%08x\n,
+   printk(KERN_DEBUG  magic: %08x\n, le32_to_cpu(sb-magic));
+   printk(KERN_DEBUGversion: %d\n, le32_to_cpu(sb-version));
+   printk(KERN_DEBUG   uuid: %08x.%08x.%08x.%08x\n,
*(__u32 *)(sb-uuid+0),
*(__u32 *)(sb-uuid+4),
*(__u32 *)(sb-uuid+8),
*(__u32 *)(sb-uuid+12));
-   printk(KERN_DEBUG   events: %llu\n,
+   printk(KERN_DEBUG events: %llu\n,
(unsigned long long) le64_to_cpu(sb-events));
-   printk(KERN_DEBUG events_clred: %llu\n,
+   printk(KERN_DEBUG events cleared: %llu\n,
(unsigned long long) le64_to_cpu(sb-events_cleared));
-   printk(KERN_DEBUGstate: %08x\n, le32_to_cpu(sb-state));
-   printk(KERN_DEBUGchunksize: %d B\n, le32_to_cpu(sb-chunksize));
-   printk(KERN_DEBUG daemon sleep: %ds\n, le32_to_cpu(sb-daemon_sleep));
-   printk(KERN_DEBUGsync size: %llu KB\n,
-   (unsigned long long)le64_to_cpu(sb-sync_size));
+   printk(KERN_DEBUG  state: %08x\n, le32_to_cpu(sb-state));
+   printk(KERN_DEBUG  chunksize: %d B\n, le32_to_cpu(sb-chunksize));
+   printk(KERN_DEBUG   daemon sleep: %ds\n, 
le32_to_cpu(sb-daemon_sleep));
+   printk(KERN_DEBUG  sync size: %llu KB\n,
+   (unsigned long long)le64_to_cpu(sb-sync_size)/2);
kunmap(bitmap-sb_page);
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH md 7 of 12] Don't skip bitmap pages due to lack of bit that we just cleared.

2005-03-22 Thread NeilBrown

When looking for pages that need cleaning we skip pages that
don't have BITMAP_PAGE_CLEAN set.  But if it is the 'current'
page we will have cleared that bit ourselves, so skipping it is wrong.
So: move the 'skip this page' inside 'if page != lastpage'.

Also fold call of file_page_offset into the one place where
the value (bit) is used.

Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |   35 +--
 1 files changed, 17 insertions(+), 18 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~  2005-03-22 17:20:08.0 +1100
+++ ./drivers/md/bitmap.c   2005-03-22 17:20:50.0 +1100
@@ -908,7 +908,7 @@ static bitmap_counter_t *bitmap_get_coun
 
 int bitmap_daemon_work(struct bitmap *bitmap)
 {
-   unsigned long bit, j;
+   unsigned long j;
unsigned long flags;
struct page *page = NULL, *lastpage = NULL;
int err = 0;
@@ -931,24 +931,23 @@ int bitmap_daemon_work(struct bitmap *bi
}
 
page = filemap_get_page(bitmap, j);
-   /* skip this page unless it's marked as needing cleaning */
-   if (!((attr=get_page_attr(bitmap, page))  BITMAP_PAGE_CLEAN)) {
-   if (attr  BITMAP_PAGE_NEEDWRITE) {
-   page_cache_get(page);
-   clear_page_attr(bitmap, page, 
BITMAP_PAGE_NEEDWRITE);
-   }
-   spin_unlock_irqrestore(bitmap-lock, flags);
-   if (attr  BITMAP_PAGE_NEEDWRITE) {
-   if (write_page(bitmap, page, 0))
-   bitmap_file_kick(bitmap);
-   page_cache_release(page);
-   }
-   continue;
-   }
-
-   bit = file_page_offset(j);
 
if (page != lastpage) {
+   /* skip this page unless it's marked as needing 
cleaning */
+   if (!((attr=get_page_attr(bitmap, page))  
BITMAP_PAGE_CLEAN)) {
+   if (attr  BITMAP_PAGE_NEEDWRITE) {
+   page_cache_get(page);
+   clear_page_attr(bitmap, page, 
BITMAP_PAGE_NEEDWRITE);
+   }
+   spin_unlock_irqrestore(bitmap-lock, flags);
+   if (attr  BITMAP_PAGE_NEEDWRITE) {
+   if (write_page(bitmap, page, 0))
+   bitmap_file_kick(bitmap);
+   page_cache_release(page);
+   }
+   continue;
+   }
+
/* grab the new page, sync and release the old */
page_cache_get(page);
if (lastpage != NULL) {
@@ -990,7 +989,7 @@ int bitmap_daemon_work(struct bitmap *bi
  -1);
 
/* clear the bit */
-   clear_bit(bit, page_address(page));
+   clear_bit(file_page_offset(j), 
page_address(page));
}
}
spin_unlock_irqrestore(bitmap-lock, flags);
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html