Re: I/O errors on Hammer volume

Matthew Dillon Fri, 16 Apr 2010 15:06:00 -0700

    I think I found the smoking gun but I can't be sure until I see
    the show output from Francois.


    I noticed that all the bad CRC'd records were typically around element
    30-32 in the B-Tree node (out of ~64 elements).  That is, the middle
    of the node.

    This implies a race between the reblocker/rebalancer and a node split
    during an insertion, or a race between the reblocker and the rebalancer.

    I am testing a fix now and I am not 100% sure that this was the issue,
    but there are a lot of things pointing to it:

    * In both Jan's and Francois's cases the inodes that got corrupted
      were in areas of the filesystem under a heavy write/create/delete
      load.

    * The corrupted records appear to nearly always be in the middle of
      the B-Tree node, which implies a race against an insertion or a
      rebalancing operation while the reblocker is running.

    * And I found a bug in the reblocker that was exposed by recent work
      (the work itself was not buggy, it just exposed the bug that already
      existed) whereby the reblocker may reblock an element after relocking
      the node but without properly checking that the element is still valid.

    Jan, I think you can test this with your psql test, after you reformat
    that volume for real and start fresh.  You should be able to test this
    by running a continuous hammer reblock operation on the data while you
    are running the database test and see if corruption ultimately occurs.

    I have included my proposed patch/fix below but please do not apply
    it yet.  I want to try to reproduce the corruption here to actually
    test whether this fixes the issue or not.

    Once we fix the issue I'll have to work up a procedure to fix any
    broken filesystems.  Locating breakage is really easy, the hammer
    show and hammer checkmap commands can be used.  Fixing it, short
    of copying off the filesystem, may be more difficult.

    Jan, I am convinced that it is NOT a problem with the age of the
    hard drive or IDE interface.

                                                -Matt


diff --git a/sys/vfs/hammer/hammer_reblock.c b/sys/vfs/hammer/hammer_reblock.c
index 76ea6a8..c6cb937 100644
--- a/sys/vfs/hammer/hammer_reblock.c
+++ b/sys/vfs/hammer/hammer_reblock.c
@@ -130,6 +130,7 @@ retry:
                /*
                 * Internal or Leaf node
                 */
+               KKASSERT(cursor.index < cursor.node->ondisk->count);
                elm = &cursor.node->ondisk->elms[cursor.index];
                reblock->key_cur.obj_id = elm->base.obj_id;
                reblock->key_cur.localization = elm->base.localization;
@@ -144,6 +145,10 @@ retry:
                 * If there is insufficient free space it may be due to
                 * reserved bigblocks, which flushing might fix.
                 *
+                * We must force a retest in case the unlocked cursor is
+                * moved to the end of the leaf, or moved to an internal
+                * node.
+                *
                 * WARNING: See warnings in hammer_unlock_cursor() function.
                 */
                if (hammer_checkspace(trans->hmp, slop)) {
@@ -152,10 +157,11 @@ retry:
                                break;
                        }
                        hammer_unlock_cursor(&cursor);
+                       cursor.flags |= HAMMER_CURSOR_RETEST;
                        hammer_flusher_wait(trans->hmp, seq);
                        hammer_lock_cursor(&cursor);
                        seq = hammer_flusher_async(trans->hmp, NULL);
-                       continue;
+                       goto skip;
                }
 
                /*
@@ -198,11 +204,10 @@ retry:
                        bwillwrite(HAMMER_XBUFSIZE);
                        hammer_lock_cursor(&cursor);
                }
-
+skip:
                if (error == 0) {
                        error = hammer_btree_iterate(&cursor);
                }
-
        }
        if (error == ENOENT)
                error = 0;

Re: I/O errors on Hammer volume

Reply via email to