Hi Bernd,

On Tue, Feb 22, 2011 at 12:49:00AM +0100, Bernd Schubert wrote:
> Hello Dejan,
> 
> 
> On 02/21/2011 05:43 PM, Dejan Muhamedagic wrote:
> > 
> > No. ext3 is a filesystem with a journal, so it is considered
> > that it can recover without fsck. Otherwise, there's a parameter
> > called run_fsck, check the meta data: crm ra info Filesystem.
> > 
> 
> no, not if it writes "Warning: mounting a filesystem with errors". In
> that case extX has recorded an error either in its super block or in the
> journal. We had a long discussion about that on the ext4 list back in
> October and in the end upstream e2fsprogs excepted a patch for e2fsck to
> allow to play back the journal only. After journal playback a possible
> error always be recorded in the superblock and from there on the a
> script can read it using dumpe2fs.  The Filesystem agent should be
> rewritten to refuse to mount if the superblock has an error. Using the
> new e2fsck option "-E  journal_only" is a bit more tricky, as only the
> most recent e2fsprogs/e2fsck version has it.
> 
> http://kerneltrap.org/mailarchive/linux-ext4/2010/10/22/6885813
> 
> http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=commit;h=71873b17307993c08b38b97c9551bed231e6048c
> 
> Below is what I added to the DDN lustre_server agent:
> 
> > # check if the superblock knows about filesystem errors
> > # return 0 if not, 1 if errors have been recorded
> > check_sb_fs_errors()
> > {
> >         with_error=`dumpe2fs -h $DEVICE 2>/dev/null | grep "Filesystem 
> > state:" | grep "error"`
> >         if [ -n "$with_error" ]; then
> >                 ocf_log err "$DEVICE : $with_error (run e2fsck)"
> >                 return 1
> >         fi
> >         return 0
> > }

This looks reasonable. Does the error state really require fsck ...

> (As I left DDN end of November and as the "e2fsck -E journal_only"
> option was not accepted upstream that time, that part is not implemented
> yet in that RA).

or would this option help in that case? Do you want to prepare a
patch for Filesystem?

> > BTW, it is very unusual (and suspicious) that the filesystem
> > starts having errors just like that, while the system's running.
> > You should find what caused the corruption.
> 
> Well, extX even recorded an error in the journal and subsequently in the
> super-block if an IO error came up. Unfortunately, there does not seem
> to a single expensive raid unit out there, that does not bring up
> errors. Although I have to admit, that FC and IB HBAs and fabric also
> play their part in that issue.
> And of course, no filesystem is free of bugs. Which is why until now
> extX suggests frequent fscks.

Hmpf. OK, must say that I expected it to be more robust.

Cheers,

Dejan

> Cheers,
> Bernd
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to