Hi Bernd, On Tue, Feb 22, 2011 at 12:49:00AM +0100, Bernd Schubert wrote: > Hello Dejan, > > > On 02/21/2011 05:43 PM, Dejan Muhamedagic wrote: > > > > No. ext3 is a filesystem with a journal, so it is considered > > that it can recover without fsck. Otherwise, there's a parameter > > called run_fsck, check the meta data: crm ra info Filesystem. > > > > no, not if it writes "Warning: mounting a filesystem with errors". In > that case extX has recorded an error either in its super block or in the > journal. We had a long discussion about that on the ext4 list back in > October and in the end upstream e2fsprogs excepted a patch for e2fsck to > allow to play back the journal only. After journal playback a possible > error always be recorded in the superblock and from there on the a > script can read it using dumpe2fs. The Filesystem agent should be > rewritten to refuse to mount if the superblock has an error. Using the > new e2fsck option "-E journal_only" is a bit more tricky, as only the > most recent e2fsprogs/e2fsck version has it. > > http://kerneltrap.org/mailarchive/linux-ext4/2010/10/22/6885813 > > http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commit;h=71873b17307993c08b38b97c9551bed231e6048c > > Below is what I added to the DDN lustre_server agent: > > > # check if the superblock knows about filesystem errors > > # return 0 if not, 1 if errors have been recorded > > check_sb_fs_errors() > > { > > with_error=`dumpe2fs -h $DEVICE 2>/dev/null | grep "Filesystem > > state:" | grep "error"` > > if [ -n "$with_error" ]; then > > ocf_log err "$DEVICE : $with_error (run e2fsck)" > > return 1 > > fi > > return 0 > > }
This looks reasonable. Does the error state really require fsck ... > (As I left DDN end of November and as the "e2fsck -E journal_only" > option was not accepted upstream that time, that part is not implemented > yet in that RA). or would this option help in that case? Do you want to prepare a patch for Filesystem? > > BTW, it is very unusual (and suspicious) that the filesystem > > starts having errors just like that, while the system's running. > > You should find what caused the corruption. > > Well, extX even recorded an error in the journal and subsequently in the > super-block if an IO error came up. Unfortunately, there does not seem > to a single expensive raid unit out there, that does not bring up > errors. Although I have to admit, that FC and IB HBAs and fabric also > play their part in that issue. > And of course, no filesystem is free of bugs. Which is why until now > extX suggests frequent fscks. Hmpf. OK, must say that I expected it to be more robust. Cheers, Dejan > Cheers, > Bernd > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
