Re: RAID1 and data safety?
Neil Brown wrote: Is there any way to tell MD to do verify-on-write and read-from-all-disks on a RAID1 array? No. I would have thought that modern disk drives did some sort of verify-on-write, else how would they detect write errors, and they are certainly in the best place to do verify-on-write. Really? My guess was that they wouldn't, because it would lead to less performance. And that's why read errors crop up at read time. Doing it at the md level would be problematic as you would have to ensure that you really were reading from the media and not from some cache somewhere in the data path. I doubt it would be a mechanism that would actually increase confidence in the safety of the data. Hmm. Could hack it by reading / writing blocks larger than the cache. Ugly. Imagine a filesystem that could access multiple devices, and where it kept index information it didn't just keep one block address, but rather kept two block address, each on different devices, and a strong checksum of the data block. This would allow much the same robustness as read-from-all-drives and much lower overhead. As in, if the checksum fails, try loading the data blocks [again] from the other device? Not sure why a checksum of X data blocks should be cheaper performance-wise than a comparison between X data blocks, but I can see the point in that you only have to load the data once and check the checksum. Not quite the same security, but almost. In summary: - you cannot do it now. - I don't think md is at the right level to solve these sort of problems. I think a filesystem could do it much better. (I'm working on a filesystem slowly...) - read-from-all-disks might get implemented one day. verify-on-write is much less likely. Apologies if the answer is in the docs. It isn't. But it is in the list archives now Thanks! :-) (Guess I'll drop the idea for the time being...) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Dell + Adaptec SATA RAID controller...
As part of a (Dell) server purchase, a client was given a free Dell 750 PowerEdge (Celeron) box with 2 x 120GB SATA drives... Opening the lid (as you do :) revealed that the motherboard has on-board SATA, but Dell had also plugged in an Adaptec 6-port SATA RAID card, and connected the 2 drives to that. Now I'm wondering why Dell have the capacity to give away free servers (I know someone else who got a free server out of them a while back) and why they'd put in a (presumably) expensive RAID controller... Maybe the mobo controllers are knackererd in some way? Anyway, I'm tempted to just remove the Adaptec card and give the on-board controllers a go using s/w RAID which I know and love... Anyone got any comments either way? Cheers, Gordon - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] md bitmap bug fixes
On Mon, Mar 21, 2005 at 02:58:56PM -0500, Paul Clements wrote: Luca Berra wrote: On Mon, Mar 21, 2005 at 11:07:06AM -0500, Paul Clements wrote: All I'm saying is that in a split-brain scenario, typical cluster frameworks will make two (or more) systems active at the same time. This I sincerely hope not. Perhaps my choice of wording was not the best? I probably should have said, there is no foolproof way to guarantee that two systems are not active. Software fails, human beings make mistakes, and surely even STONITH devices can be misconfigured or can fail (or cannot be used for one reason or another). well, careful use of an arbitrator node, possibly in a different location, helps avoiding split-brains, and stonith is a requirement At any rate, this is all irrelevant given the second part of that email reply that I gave. You still have to do the bitmap combining, regardless of whether two systems were active at the same time or not. I still maintain that doing data-replication with md over nbd is a painful and not very useful exercise. If we want to do data-replication, access to the data-replicated device should be controlled by the data replication process (*), md does not guarantee this. (*) i.e. my requirements could be that having a replicated transaction is more important that completing the transaction itself, so i might want to return a disk error in case replica fails. or to the contrary i might want data availability above all else, maybe data does not change much. or something in between, data availability above replication, but data validity over availability. this is probably the most common scenario, and the more difficult to implement correctly. In any case it must be possible to control exactly which steps should be automatically done in case of failure. and in case of rollback, with the sane default would be die rather than modify any data, in case of doubt. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] md bitmap bug fixes
Luca Berra [EMAIL PROTECTED] wrote: If we want to do data-replication, access to the data-replicated device should be controlled by the data replication process (*), md does not guarantee this. Well, if one writes to the md device, then md does guarantee this - but I find it hard to parse the statement. Can you elaborate a little in order to reduce my possible confusion? (*) i.e. my requirements could be that having a replicated transaction is more important that completing the transaction itself, so i might want to return a disk error in case replica fails. Oh - I see. We did half off all the replications possible. That's an interesting idea and it is trivial to modify md to return error if not all the replications succeeded. The bitmap knows right now. No reason not to call end_io(...,0) instead of end_io(...,1) if you want it that way. or to the contrary i might want data availability above all else, maybe data does not change much. or something in between, data availability above replication, but data validity over availability. this is probably the most common scenario, and the more difficult to implement correctly. In any case it must be possible to control exactly which steps should be automatically done in case of failure. and in case of rollback, with the sane default would be die rather than modify any data, in case of doubt. Well, if you want to be more exact about it, I am sure your wishes can be accomodated. It's not a bad idea to be able to tailor the policy. Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Strangeness when booting raid1: md5 already running?
On Tue, 22 Mar 2005, Neil Brown wrote: On Monday March 21, [EMAIL PROTECTED] wrote: ...repeated several times md: export_rdev(hda9) md: ... autorun DONE. EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. EXT3-fs: recovery complete. Now this was the first time the string md5 appears in the log. And indeed, it appears that hda9 has been kicked out of the array: So was md5 actually running (what did /proc/mdstat show? What about mdadm -D /dev/md5?). mdstat and mdadm both report that md5 is running degraded - with hdc9 and hda9 removed. which wasn't part of an array but couldn't be added to one. Nothing particularly interesting. Ok. I'll just re-add the drive and see what happens. Thanks Ruth -- Ruth Ivimey-Cook Software engineer and technical writer. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
AW: RAID1 and data safety?
Neil Brown wrote: Is there any way to tell MD to [...] and read-from-all-disks on a RAID1 array? Not sure why a checksum of X data blocks should be cheaper performance-wise than a comparison between X data blocks, but I can see the point in that you only have to load the data once and check the checksum. Not quite the same security, but almost. Still, if there is different data on the two disks due to a previous power failure, the comparsion could really be the better choise, isn't it? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Questions regarding readonly/readwrite semantics
Hello, in the beginning I had just one simple question :) ... Is there any way to start RAIDs in readonly mode while autodetection on system boot? I was thinking about some 2-stage boot mimic that first enables all RAIDs readonly (while autodetection for example) and then in a second stage sets them to readwrite (in a early init script for example). Such a mimic could provide a chance to system administrators for intervention before automatic resync or such things would take place (by booting up in emergency mode for example). Since I found no solution to tell the autodetection to start up RAIDs in readonly mode, I was thinking about kicking autodetection at all and starting up RAIDs via mdadm or something like that. However, when looking at man mdadm it seems to me that it is impossible there too, to start RAIDs in readonly mode (since --readonly and --readwrite are defined in Misc mode only, while I would need them in Assemble mode, wouldn't I?). So my second question came up :) ... Is there any way to start RAIDs in readonly mode at all? And then while looking at --readonly and --readwrite semantics, some more questions came up :) ... I was trying in emergency mode with two RAID1: md0 and md4. md0: initially readwrite mounted / ro md4: initially readwrite not mounted I'm running 2.4.27 built from Debian's kernel-source-2.4.27. I'm booting with ro in kernel commandline, so root device is mounted readonly initially. # mdadm --readonly /dev/md0 failed to set readonly: EBUSY This is somehow understandable. However, it would be nice to have a way to force it. Furthermore, since the device is (should be?) opened readonly, it should be possible to set it readonly, too. # mdadm --readwrite /dev/md0 failed to set writable: EBUSY Huh, why does that fail? It *is* writable already! # mdadm --readonly /dev/md4 Works. Of course. # mount -o ro /dev/md4 /usr # mdadm --readwrite /dev/md4 Works. Why does it work? If setting an already readwrite device to readwrite fails, *this* one should fail more than ever! # mdadm --readonly /dev/md4 failed to set readonly: EBUSY Expected. However, since this device *must* be opened readonly (since it *was* readonly at mount time), it should definitely be possible to set it back to readonly. Well, the whole readonly/readwrite semantics seem somehow inconsistent to me. Setting a mounted and already readwrite device to readwrite fails, while setting a mounted but readonly device to readwrite works. And a last question came up then, too: blockdev --setro /dev/md0 md: blockdev(pid 3446) used obsolete MD ioctl, upgrade your software to use new ictls. BLKROSET: Invalid argument Is there any objective, why md does not support the standard block device readonly/readwrite ioctls? regards Mario -- Independence Day: Fortunately, the alien computer operating system works just fine with the laptop. This proves an important point which Apple enthusiasts have known for years. While the evil empire of Microsoft may dominate the computers of Earth people, more advanced life forms clearly prefer Mac's. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] md bitmap bug fixes
Paul Clements [EMAIL PROTECTED] wrote: system A [raid1] / \ [disk][nbd] -- system B 2) you're writing, say, block 10 to the raid1 when A crashes (block 10 is dirty in the bitmap, and you don't know whether it got written to the disk on A or B, neither, or both) Let me offer an example based on this scenario. Block 10 is sent to both, and B's bitmap is dirtied for it, but the data itself never arrives. At the same time block 10 is sent to A, and the bitmap is dirtied for it, the data sent, and (miraculously) the bitmap on A is cleared for the received data (I don't now why or how - nobody has yet specified the algorithm with enough precision for me to say). At this point B's bitmap is dirty for block 10, and A's is not. A has received the data for block 10, and B has not. 3) something (i.e., your cluster framework) notices that A is gone and brings up a new raid1, with an empty bitmap, on system B: Now, this looks wrong, because to sync A from B we will later need to copy block 10 from B to A in order to undo the extra write already done on A, and A's bitmap is not marked dirty for block 10, only B's is, so we cannot zero B's bitmap because that would lose the information about block 10. -- I've been thinking about this in more general terms, and it seems to me that the algorithms offered (and I say I have not seen enough detail to be sure) may be in general insufficiently pessimistic. That is, they may clear the bitmap too soon (as in the thought experiment above). Or they may not dirty the bitmaps soon enough. I believe that you are aiming for algorithms in which the _combined_ bitmaps are sufficiently pessimistic, but the individual bitmaps are not necesarily so. But it appears to me as though it may not be much trouble to ensure that _each_ bitmap is sufficiently pessimistic on its own with respect to clearing. Just clear _each_ bitmap only when _both_ writes have been done. -- Can this plan fail to be pessimistic enough with respect to dirtying the bitmaps in the first place? What if block 10 is sent to A, which is to say the bitmap on A is dirtied, and the data sent, and received on A. Can B _not_ have its bitmap dirtied for block 10? Well, yes, if A dies before sending out the bitmap dirty to B, but after sending out the bitmap dirty AND the data to A. That's normally not possible. We normally surely send out all bitmap dirties before sending out any data. But can we wait for these to complete before starting on the data writes? If B times out, we will have to go ahead and dirty A's bitmap on its own and thereafter always dirty and never clear it. So this corresponds to A continuing to work after losing contact with B. Now, if A dies after that, and for some reason we start using B, then B will need eventually to have its block 10 sent to A when we resync A from B. But we never should have switched to B in the first place! B was expelled from the array. But A maybe died before saying so to anyone. Well, plainly A should not have gone on to write anything in the array after expelling B until it was able to write in its (A's) superblock that B had been expelled. Then, later, on recovery with a sync from B to A (even though it is the wrong direction), A will either say in its sb that B has not been expelled AND contain no extra writes t be undone from B, or A will say that B has been expelled, and its bitmap will say which writes have been done that were not done on B, and we can happily decide to sync from B, or sync from A. So it looks like there are indeed several admin foul-ups and crossed wires which could give us reason to sync in the rong direction, and then we will want to know what the recipient has in its bitmap. But we will be able to see that that is the situuation. In all other cases, it is sufficient to know just the bitmap on the master. The particular dubious situation outlined here is 1) A loses contact with B and continues working without B in the array, so B is out of date. 2) A dies, and B is recovered, becoming used as the master. 3) When A is recovered, we choose to sync A from B, not B from A. In that case we need to look at bitmaps both sides. But note that one bitmap per array (on the local side) would suffice in this case. The array node location shifts during the process outlined, givig two bitmaps to make use of eventually. Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md ] md: allow degraded raid1 array to resync after an unclean shutdown.
The following is (I think) appropriate for 2.4.30. The bug it fixes can result in data corruption in a fairly unusual circumstance (having a 3 drive raid1 array running in degraded mode, and suffering a system crash). ### Comments for Changeset If a raid1 array has more than two devices, and not all are working, then it will not resync after an unclean shutdown (as it will think that it should reconstruct a failed drive, and will find there aren't any spares...) This patch fixes the problem. Problem found by Mario Holbe [EMAIL PROTECTED] (thanks!) Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid1.c | 13 - 1 files changed, 8 insertions(+), 5 deletions(-) diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c --- ./drivers/md/raid1.c~current~ 2005-03-23 11:28:56.0 +1100 +++ ./drivers/md/raid1.c2005-03-23 11:38:41.0 +1100 @@ -891,6 +891,8 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t *failed_desc, *spare_desc, *added_desc; mdk_rdev_t *spare_rdev, *failed_rdev; + if (conf-resync_mirrors) + return 1; /* Cannot do any diskops during a resync */ switch (state) { case DISKOP_SPARE_ACTIVE: @@ -1333,6 +1335,8 @@ static void raid1syncd (void *data) up(mddev-recovery_sem); raid1_shrink_buffers(conf); + + md_recover_arrays(); /* incase we are degraded and a spare is available */ } /* @@ -1741,10 +1745,6 @@ static int raid1_run (mddev_t *mddev) conf-last_used = j; - if (conf-working_disks != sb-raid_disks) { - printk(KERN_ALERT raid1: md%d, not all disks are operational -- trying to recover array\n, mdidx(mddev)); - start_recovery = 1; - } { const char * name = raid1d; @@ -1756,7 +1756,7 @@ static int raid1_run (mddev_t *mddev) } } - if (!start_recovery !(sb-state (1 MD_SB_CLEAN)) + if (!(sb-state (1 MD_SB_CLEAN)) (conf-working_disks 1)) { const char * name = raid1syncd; @@ -1769,6 +1769,9 @@ static int raid1_run (mddev_t *mddev) printk(START_RESYNC, mdidx(mddev)); conf-resync_mirrors = 1; md_wakeup_thread(conf-resync_thread); + } else if (conf-working_disks != sb-raid_disks) { + printk(KERN_ALERT raid1: md%d, not all disks are operational -- trying to recover array\n, mdidx(mddev)); + start_recovery = 1; } /* - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md 4 of 12] Minor code rearrangement in bitmap_init_from_disk
Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-03-22 17:13:16.0 +1100 +++ ./drivers/md/bitmap.c 2005-03-22 17:19:19.0 +1100 @@ -782,7 +782,9 @@ static int bitmap_init_from_disk(struct recovery\n, bmname(bitmap)); bytes = (chunks + 7) / 8; - num_pages = (bytes + PAGE_SIZE - 1) / PAGE_SIZE; + + num_pages = (bytes + sizeof(bitmap_super_t) + PAGE_SIZE - 1) / PAGE_SIZE + 1; + if (i_size_read(file-f_mapping-host) bytes + sizeof(bitmap_super_t)) { printk(KERN_INFO %s: bitmap file too short %lu %lu\n, bmname(bitmap), @@ -790,18 +792,16 @@ static int bitmap_init_from_disk(struct bytes + sizeof(bitmap_super_t)); goto out; } - num_pages++; + + ret = -ENOMEM; + bitmap-filemap = kmalloc(sizeof(struct page *) * num_pages, GFP_KERNEL); - if (!bitmap-filemap) { - ret = -ENOMEM; + if (!bitmap-filemap) goto out; - } bitmap-filemap_attr = kmalloc(sizeof(long) * num_pages, GFP_KERNEL); - if (!bitmap-filemap_attr) { - ret = -ENOMEM; + if (!bitmap-filemap_attr) goto out; - } memset(bitmap-filemap_attr, 0, sizeof(long) * num_pages); - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md 2 of 12] Enable the bitmap write-back daemon and wait for it.
Currently we don't wait for updates to the bitmap to be flushed to disk properly. The infrastructure all there, but it isn't being used A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each page as we cannot get callbacks when a page write completes. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c | 119 ++ ./include/linux/raid/bitmap.h | 13 2 files changed, 55 insertions(+), 77 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-03-22 17:11:04.0 +1100 +++ ./drivers/md/bitmap.c 2005-03-22 17:12:09.0 +1100 @@ -261,30 +261,33 @@ char *file_path(struct file *file, char /* * write out a page */ -static int write_page(struct page *page, int wait) +static int write_page(struct bitmap *bitmap, struct page *page, int wait) { int ret = -ENOMEM; lock_page(page); - if (page-mapping == NULL) - goto unlock_out; - else if (i_size_read(page-mapping-host) page-index PAGE_SHIFT) { - ret = -ENOENT; - goto unlock_out; - } - ret = page-mapping-a_ops-prepare_write(NULL, page, 0, PAGE_SIZE); if (!ret) ret = page-mapping-a_ops-commit_write(NULL, page, 0, PAGE_SIZE); if (ret) { -unlock_out: unlock_page(page); return ret; } set_page_dirty(page); /* force it to be written out */ + + if (!wait) { + /* add to list to be waited for by daemon */ + struct page_list *item = mempool_alloc(bitmap-write_pool, GFP_NOIO); + item-page = page; + page_cache_get(page); + spin_lock(bitmap-write_lock); + list_add(item-list, bitmap-complete_pages); + spin_unlock(bitmap-write_lock); + md_wakeup_thread(bitmap-writeback_daemon); + } return write_one_page(page, wait); } @@ -343,14 +346,13 @@ int bitmap_update_sb(struct bitmap *bitm spin_unlock_irqrestore(bitmap-lock, flags); return 0; } - page_cache_get(bitmap-sb_page); spin_unlock_irqrestore(bitmap-lock, flags); sb = (bitmap_super_t *)kmap(bitmap-sb_page); sb-events = cpu_to_le64(bitmap-mddev-events); if (!bitmap-mddev-degraded) sb-events_cleared = cpu_to_le64(bitmap-mddev-events); kunmap(bitmap-sb_page); - return write_page(bitmap-sb_page, 0); + return write_page(bitmap, bitmap-sb_page, 0); } /* print out the bitmap file superblock */ @@ -556,10 +558,10 @@ static void bitmap_file_unmap(struct bit static void bitmap_stop_daemons(struct bitmap *bitmap); /* dequeue the next item in a page list -- don't call from irq context */ -static struct page_list *dequeue_page(struct bitmap *bitmap, - struct list_head *head) +static struct page_list *dequeue_page(struct bitmap *bitmap) { struct page_list *item = NULL; + struct list_head *head = bitmap-complete_pages; spin_lock(bitmap-write_lock); if (list_empty(head)) @@ -573,23 +575,15 @@ out: static void drain_write_queues(struct bitmap *bitmap) { - struct list_head *queues[] = { bitmap-complete_pages, NULL }; - struct list_head *head; struct page_list *item; - int i; - for (i = 0; queues[i]; i++) { - head = queues[i]; - while ((item = dequeue_page(bitmap, head))) { - page_cache_release(item-page); - mempool_free(item, bitmap-write_pool); - } + while ((item = dequeue_page(bitmap))) { + /* don't bother to wait */ + page_cache_release(item-page); + mempool_free(item, bitmap-write_pool); } - spin_lock(bitmap-write_lock); - bitmap-writes_pending = 0; /* make sure waiters continue */ wake_up(bitmap-write_wait); - spin_unlock(bitmap-write_lock); } static void bitmap_file_put(struct bitmap *bitmap) @@ -734,13 +728,13 @@ int bitmap_unplug(struct bitmap *bitmap) spin_unlock_irqrestore(bitmap-lock, flags); if (attr (BITMAP_PAGE_DIRTY | BITMAP_PAGE_NEEDWRITE)) - if (write_page(page, 0)) + if (write_page(bitmap, page, 0)) return 1; } if (wait) { /* if any writes were performed, we need to wait on them */ spin_lock_irq(bitmap-write_lock); wait_event_lock_irq(bitmap-write_wait, - bitmap-writes_pending == 0, bitmap-write_lock, + list_empty(bitmap-complete_pages), bitmap-write_lock,
[PATCH md 8 of 12] A couple of tidyups relating to the bitmap file.
1/ When init from disk, it is a BUG if there is nowhere to init from, 2/ use seq_path to print path in /proc/mdstat Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c |8 +--- ./drivers/md/md.c | 11 +-- 2 files changed, 6 insertions(+), 13 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-03-22 17:20:50.0 +1100 +++ ./drivers/md/bitmap.c 2005-03-22 17:21:47.0 +1100 @@ -764,13 +764,7 @@ static int bitmap_init_from_disk(struct chunks = bitmap-chunks; file = bitmap-file; - if (!file) { /* no file, dirty all the in-memory bits */ - printk(KERN_INFO %s: no bitmap file, doing full recovery\n, - bmname(bitmap)); - bitmap_set_memory_bits(bitmap, 0, - chunks CHUNK_BLOCK_SHIFT(bitmap), 1); - return 0; - } + BUG_ON(!file); #if INJECT_FAULTS_3 outofdate = 1; diff ./drivers/md/md.c~current~ ./drivers/md/md.c --- ./drivers/md/md.c~current~ 2005-03-22 17:20:16.0 +1100 +++ ./drivers/md/md.c 2005-03-22 17:21:30.0 +1100 @@ -3259,10 +3259,8 @@ static int md_seq_show(struct seq_file * seq_printf(seq, \n ); if ((bitmap = mddev-bitmap)) { - char *buf, *path; unsigned long chunk_kb; unsigned long flags; - buf = kmalloc(PAGE_SIZE, GFP_KERNEL); spin_lock_irqsave(bitmap-lock, flags); chunk_kb = bitmap-chunksize 10; seq_printf(seq, bitmap: %lu/%lu pages [%luKB], @@ -3273,13 +3271,14 @@ static int md_seq_show(struct seq_file * (PAGE_SHIFT - 10), chunk_kb ? chunk_kb : bitmap-chunksize, chunk_kb ? KB : B); - if (bitmap-file buf) { - path = file_path(bitmap-file, buf, PAGE_SIZE); - seq_printf(seq, , file: %s, path ? path : ); + if (bitmap-file) { + seq_printf(seq, , file: ); + seq_path(seq, bitmap-file-f_vfsmnt, +bitmap-file-f_dentry, \t\n); } + seq_printf(seq, \n); spin_unlock_irqrestore(bitmap-lock, flags); - kfree(buf); } seq_printf(seq, \n); - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md 0 of 12] Introduction
Here are 12 patches for the bitmap write-intent logging in md in 2.6.12-rc1-mm1 With this, it is getting quite close to being really usable (though there are a couple of issues that I haven't resolved yet. Andrew: Are you happy to keep collecting these as a list of patches (bugs followed by bug-fixes :-), or would it be easier if I merged all the bug fixes into earlier patches and just resent a small number of add-functionality patches?? NeilBrown [PATCH md 1 of 12] Check return value of write_page, rather than ignore it [PATCH md 2 of 12] Enable the bitmap write-back daemon and wait for it. [PATCH md 3 of 12] Improve debug-printing of bitmap superblock. [PATCH md 4 of 12] Minor code rearrangement in bitmap_init_from_disk [PATCH md 5 of 12] Print correct pid for newly created bitmap-writeback-daemon. [PATCH md 6 of 12] Call bitmap_daemon_work regularly [PATCH md 7 of 12] Don't skip bitmap pages due to lack of bit that we just cleared. [PATCH md 8 of 12] A couple of tidyups relating to the bitmap file. [PATCH md 9 of 12] Make sure md bitmap is cleared on a clean start. [PATCH md 10 of 12] Fix bug when raid1 attempts a partial reconstruct. [PATCH md 11 of 12] Allow md to update multiple superblocks in parallel. [PATCH md 12 of 12] Allow md intent bitmap to be stored near the superblock. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md 5 of 12] Print correct pid for newly created bitmap-writeback-daemon.
The debugging message printed the wrong pid, which didn't help remove bugs Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-03-22 17:19:19.0 +1100 +++ ./drivers/md/bitmap.c 2005-03-22 17:20:08.0 +1100 @@ -1107,7 +1107,7 @@ static int bitmap_start_daemon(struct bi md_wakeup_thread(daemon); /* start it running */ PRINTK(%s: %s daemon (pid %d) started...\n, - bmname(bitmap), name, bitmap-daemon-tsk-pid); + bmname(bitmap), name, daemon-tsk-pid); out_unlock: spin_unlock_irqrestore(bitmap-lock, flags); return 0; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md 3 of 12] Improve debug-printing of bitmap superblock.
- report sync_size properly - need /2 to convert sectors to KB - move everything over 2 spaces to allow proper spelling of events cleared. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-03-22 17:12:09.0 +1100 +++ ./drivers/md/bitmap.c 2005-03-22 17:13:16.0 +1100 @@ -364,22 +364,22 @@ void bitmap_print_sb(struct bitmap *bitm return; sb = (bitmap_super_t *)kmap(bitmap-sb_page); printk(KERN_DEBUG %s: bitmap file superblock:\n, bmname(bitmap)); - printk(KERN_DEBUGmagic: %08x\n, le32_to_cpu(sb-magic)); - printk(KERN_DEBUG version: %d\n, le32_to_cpu(sb-version)); - printk(KERN_DEBUG uuid: %08x.%08x.%08x.%08x\n, + printk(KERN_DEBUG magic: %08x\n, le32_to_cpu(sb-magic)); + printk(KERN_DEBUGversion: %d\n, le32_to_cpu(sb-version)); + printk(KERN_DEBUG uuid: %08x.%08x.%08x.%08x\n, *(__u32 *)(sb-uuid+0), *(__u32 *)(sb-uuid+4), *(__u32 *)(sb-uuid+8), *(__u32 *)(sb-uuid+12)); - printk(KERN_DEBUG events: %llu\n, + printk(KERN_DEBUG events: %llu\n, (unsigned long long) le64_to_cpu(sb-events)); - printk(KERN_DEBUG events_clred: %llu\n, + printk(KERN_DEBUG events cleared: %llu\n, (unsigned long long) le64_to_cpu(sb-events_cleared)); - printk(KERN_DEBUGstate: %08x\n, le32_to_cpu(sb-state)); - printk(KERN_DEBUGchunksize: %d B\n, le32_to_cpu(sb-chunksize)); - printk(KERN_DEBUG daemon sleep: %ds\n, le32_to_cpu(sb-daemon_sleep)); - printk(KERN_DEBUGsync size: %llu KB\n, - (unsigned long long)le64_to_cpu(sb-sync_size)); + printk(KERN_DEBUG state: %08x\n, le32_to_cpu(sb-state)); + printk(KERN_DEBUG chunksize: %d B\n, le32_to_cpu(sb-chunksize)); + printk(KERN_DEBUG daemon sleep: %ds\n, le32_to_cpu(sb-daemon_sleep)); + printk(KERN_DEBUG sync size: %llu KB\n, + (unsigned long long)le64_to_cpu(sb-sync_size)/2); kunmap(bitmap-sb_page); } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH md 7 of 12] Don't skip bitmap pages due to lack of bit that we just cleared.
When looking for pages that need cleaning we skip pages that don't have BITMAP_PAGE_CLEAN set. But if it is the 'current' page we will have cleared that bit ourselves, so skipping it is wrong. So: move the 'skip this page' inside 'if page != lastpage'. Also fold call of file_page_offset into the one place where the value (bit) is used. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c | 35 +-- 1 files changed, 17 insertions(+), 18 deletions(-) diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c --- ./drivers/md/bitmap.c~current~ 2005-03-22 17:20:08.0 +1100 +++ ./drivers/md/bitmap.c 2005-03-22 17:20:50.0 +1100 @@ -908,7 +908,7 @@ static bitmap_counter_t *bitmap_get_coun int bitmap_daemon_work(struct bitmap *bitmap) { - unsigned long bit, j; + unsigned long j; unsigned long flags; struct page *page = NULL, *lastpage = NULL; int err = 0; @@ -931,24 +931,23 @@ int bitmap_daemon_work(struct bitmap *bi } page = filemap_get_page(bitmap, j); - /* skip this page unless it's marked as needing cleaning */ - if (!((attr=get_page_attr(bitmap, page)) BITMAP_PAGE_CLEAN)) { - if (attr BITMAP_PAGE_NEEDWRITE) { - page_cache_get(page); - clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE); - } - spin_unlock_irqrestore(bitmap-lock, flags); - if (attr BITMAP_PAGE_NEEDWRITE) { - if (write_page(bitmap, page, 0)) - bitmap_file_kick(bitmap); - page_cache_release(page); - } - continue; - } - - bit = file_page_offset(j); if (page != lastpage) { + /* skip this page unless it's marked as needing cleaning */ + if (!((attr=get_page_attr(bitmap, page)) BITMAP_PAGE_CLEAN)) { + if (attr BITMAP_PAGE_NEEDWRITE) { + page_cache_get(page); + clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE); + } + spin_unlock_irqrestore(bitmap-lock, flags); + if (attr BITMAP_PAGE_NEEDWRITE) { + if (write_page(bitmap, page, 0)) + bitmap_file_kick(bitmap); + page_cache_release(page); + } + continue; + } + /* grab the new page, sync and release the old */ page_cache_get(page); if (lastpage != NULL) { @@ -990,7 +989,7 @@ int bitmap_daemon_work(struct bitmap *bi -1); /* clear the bit */ - clear_bit(bit, page_address(page)); + clear_bit(file_page_offset(j), page_address(page)); } } spin_unlock_irqrestore(bitmap-lock, flags); - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html