I think I found the smoking gun but I can't be sure until I see the show output from Francois.
I noticed that all the bad CRC'd records were typically around element 30-32 in the B-Tree node (out of ~64 elements). That is, the middle of the node. This implies a race between the reblocker/rebalancer and a node split during an insertion, or a race between the reblocker and the rebalancer. I am testing a fix now and I am not 100% sure that this was the issue, but there are a lot of things pointing to it: * In both Jan's and Francois's cases the inodes that got corrupted were in areas of the filesystem under a heavy write/create/delete load. * The corrupted records appear to nearly always be in the middle of the B-Tree node, which implies a race against an insertion or a rebalancing operation while the reblocker is running. * And I found a bug in the reblocker that was exposed by recent work (the work itself was not buggy, it just exposed the bug that already existed) whereby the reblocker may reblock an element after relocking the node but without properly checking that the element is still valid. Jan, I think you can test this with your psql test, after you reformat that volume for real and start fresh. You should be able to test this by running a continuous hammer reblock operation on the data while you are running the database test and see if corruption ultimately occurs. I have included my proposed patch/fix below but please do not apply it yet. I want to try to reproduce the corruption here to actually test whether this fixes the issue or not. Once we fix the issue I'll have to work up a procedure to fix any broken filesystems. Locating breakage is really easy, the hammer show and hammer checkmap commands can be used. Fixing it, short of copying off the filesystem, may be more difficult. Jan, I am convinced that it is NOT a problem with the age of the hard drive or IDE interface. -Matt diff --git a/sys/vfs/hammer/hammer_reblock.c b/sys/vfs/hammer/hammer_reblock.c index 76ea6a8..c6cb937 100644 --- a/sys/vfs/hammer/hammer_reblock.c +++ b/sys/vfs/hammer/hammer_reblock.c @@ -130,6 +130,7 @@ retry: /* * Internal or Leaf node */ + KKASSERT(cursor.index < cursor.node->ondisk->count); elm = &cursor.node->ondisk->elms[cursor.index]; reblock->key_cur.obj_id = elm->base.obj_id; reblock->key_cur.localization = elm->base.localization; @@ -144,6 +145,10 @@ retry: * If there is insufficient free space it may be due to * reserved bigblocks, which flushing might fix. * + * We must force a retest in case the unlocked cursor is + * moved to the end of the leaf, or moved to an internal + * node. + * * WARNING: See warnings in hammer_unlock_cursor() function. */ if (hammer_checkspace(trans->hmp, slop)) { @@ -152,10 +157,11 @@ retry: break; } hammer_unlock_cursor(&cursor); + cursor.flags |= HAMMER_CURSOR_RETEST; hammer_flusher_wait(trans->hmp, seq); hammer_lock_cursor(&cursor); seq = hammer_flusher_async(trans->hmp, NULL); - continue; + goto skip; } /* @@ -198,11 +204,10 @@ retry: bwillwrite(HAMMER_XBUFSIZE); hammer_lock_cursor(&cursor); } - +skip: if (error == 0) { error = hammer_btree_iterate(&cursor); } - } if (error == ENOENT) error = 0;