Hello Dejan,

On 02/21/2011 05:43 PM, Dejan Muhamedagic wrote:
> 
> No. ext3 is a filesystem with a journal, so it is considered
> that it can recover without fsck. Otherwise, there's a parameter
> called run_fsck, check the meta data: crm ra info Filesystem.
> 

no, not if it writes "Warning: mounting a filesystem with errors". In
that case extX has recorded an error either in its super block or in the
journal. We had a long discussion about that on the ext4 list back in
October and in the end upstream e2fsprogs excepted a patch for e2fsck to
allow to play back the journal only. After journal playback a possible
error always be recorded in the superblock and from there on the a
script can read it using dumpe2fs.  The Filesystem agent should be
rewritten to refuse to mount if the superblock has an error. Using the
new e2fsck option "-E  journal_only" is a bit more tricky, as only the
most recent e2fsprogs/e2fsck version has it.

http://kerneltrap.org/mailarchive/linux-ext4/2010/10/22/6885813

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commit;h=71873b17307993c08b38b97c9551bed231e6048c

Below is what I added to the DDN lustre_server agent:

> # check if the superblock knows about filesystem errors
> # return 0 if not, 1 if errors have been recorded
> check_sb_fs_errors()
> {
>         with_error=`dumpe2fs -h $DEVICE 2>/dev/null | grep "Filesystem 
> state:" | grep "error"`
>         if [ -n "$with_error" ]; then
>                 ocf_log err "$DEVICE : $with_error (run e2fsck)"
>                 return 1
>         fi
>         return 0
> }

(As I left DDN end of November and as the "e2fsck -E journal_only"
option was not accepted upstream that time, that part is not implemented
yet in that RA).


> BTW, it is very unusual (and suspicious) that the filesystem
> starts having errors just like that, while the system's running.
> You should find what caused the corruption.

Well, extX even recorded an error in the journal and subsequently in the
super-block if an IO error came up. Unfortunately, there does not seem
to a single expensive raid unit out there, that does not bring up
errors. Although I have to admit, that FC and IB HBAs and fabric also
play their part in that issue.
And of course, no filesystem is free of bugs. Which is why until now
extX suggests frequent fscks.

Cheers,
Bernd
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to