On Fri, Dec 15, 2006 at 01:32:51PM +0100, Bas van Schaik wrote: > Hi all, > > After browsing through the debian-kernel mailinglist archive, I found > out that there's no one reporting the latest EXT3 problems in the > vanilla kernel. The last report of EXT3-problems on the debian-kernel > list had to do with JBD, the current problems (as posted on the Linux > Kernel mailinglist) are much worse, I think. > You might want to check those URLS/subjects of discussion on LKML: > > "2.6.18-mm2: ext3 BUG?" > http://lkml.org/lkml/2006/10/5/353 > Seems unresolved
fixed in some 2.6.18.X and only affects 1k bs > > "2.6.19 file content corruption on ext3" > http://lkml.org/lkml/2006/12/7/163 > Has to do with 2.6.19, but might have it's roots in 2.6.18 new 2.6.19 code > "Debugging I/O errors?" > http://lkml.org/lkml/2006/10/20/93 > Source unknown, but more people seem to have the same problem. > > > These issues got my attention, because I'm having those (or similar) > problems myself, on two different machines (clusters, actually) with > completely different hardware and disks. I'll explain. > > I'm maintaining two clusters, with machines running a mix between Debian > Stable with Etch-kernels to have AoE (ATA over Ethernet support). > Machines in these clusters "export" their harddisks using AoE (check out > the "vblade" package), and one machine imports those using the kernel > "aoe"-module. On top of those imported devices, multiple RAID5-arrays > are created, and LVM is running on top of RAID, ext3 on the LVM LV. > > After a few days, I get EXT3-errors. like this: > > EXT3-fs: mounted filesystem with ordered data mode. > > EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared for > > block 412186 > > Aborting journal on device loop0. > > EXT3-fs error (device loop0) in ext3_free_blocks_sb: Journal has aborted > > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has > > aborted > > EXT3-fs error (device loop0) in ext3_truncate: Journal has aborted > > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has > > aborted > > EXT3-fs error (device loop0) in ext3_orphan_del: Journal has aborted > > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has > > aborted > > EXT3-fs error (device loop0) in ext3_delete_inode: Journal has aborted > > __journal_remove_journal_head: freeing b_committed_data > > __journal_remove_journal_head: freeing b_committed_data > (...) > > __journal_remove_journal_head: freeing b_committed_data > > ext3_abort called. > > EXT3-fs error (device loop0): ext3_journal_start_sb: Detected aborted > > journal > > Remounting filesystem read-only > > __journal_remove_journal_head: freeing b_committed_data > > FSCK'ing the filesystem fixes those errors, but after a few days (or > weeks, depending on the fs load) the corruptions appear again. I might > be worth telling you that there are no other suspicious messages in my logs. inform ext3-devel: [email protected] > This seems to be related to the problem described here: > http://myrddin.org/2006/02/14/ext3-nastiness/ > > and here: > http://www.debian-administration.org/users/Utumno/weblog/16 > > > I don't know if I need to file a bug on this, for now I just want to > here your thoughts. FYI: > > Kernel information for cluster 1: > > [EMAIL PROTECTED]:~# uname -a > > Linux infinity 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 > > GNU/Linux > > And cluster 2: > > dust:~# uname -a > > Linux dust 2.6.18-3-686 #1 SMP Thu Nov 23 20:49:23 UTC 2006 i686 GNU/Linux > > Thanks for your replies! > > Best regards, > > -- Bas van Schaik best regards -- maks -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

