Re: [Jfs-discussion] Fsck on every unclean shutdown

Peter Grandi Sat, 09 Jan 2010 11:27:39 -0800

> I noticed recently on my system that anytime I have an unclean
> shutdown on my machine an fsck has to be ran on the volume.


That in itself is correct -- jfs_fsck' runs to complete the log
in JFS, while other file systems have log completion in the kernel.

> I don't see this on my server or the OS partition which is
> also JFS.

This may be because some application is hitting that filesystem
and has still some file open and updating them throught the
shutdown, while the other filesystems are less active.

> r...@sabayonx86-64: 11:48 PM :~# df -H /data
> Filesystem             Size   Used  Avail Use% Mounted on
> /dev/sdc3               18T    13T   5.4T  71% /data

> r...@sabayonx86-64: 11:48 PM :~# df -hi /data
> Filesystem            Inodes   IUsed   IFree IUse% Mounted on
> /dev/sdc3               4.0G    6.0M    4.0G    1% /data

That's fairly large for a single filesystem, and implies RAID; no
idea of what it is used for, and in particular how actively and
fast it is written to.

> fsck.jfs version 1.1.11, 05-Jun-2006
> processing started: 3/8/2009 3.30.9
> The current device is:  /dev/sdc3
> Block size in bytes:  4096
> Filesystem size in blocks:  4360350561
> **Phase 0 - Replay Journal Log
> logredo failed (rc=-231).  fsck continuing.

That is worrying, it is a failure to complete the log.

> [ ... ]
> **Phase 9 - Reformat File System Log

And that recreates the log.

> I get an fsck on this partition basically everytime my machine
> is not shut down properly everytime my machine is not shut down
> properly

That is indeed expected -- as mentioned above 'jfs_fsck' is run to
complete the log in those cases. The worrying bit above is that it
fails to do so because the log has been corrupted.

>From you report it appears that if you shutdown the system
properly everything is fine, and the log is nor corrupted; please
double check by doing a proper shutdown and then running 'umount
/dev/sdc3; jfs_fsck -f /dev/sdc3' to make sure.

What I suspect is that either you using the known-buggy version of
JFS (unlikely) or rather more likely your filesystem is quite
active and you haven's considered the issue of stable storage and
barriers, and there is in-flight log data when your "machine is
not shut down properly", and is lost.

Of course if there is some in-flight metadata like log entries at
that time very bad news will happen; and given that there is
likely to be in-flight data too, your data is highly suspect
too. As per a recent discussion about 'ext4', "user space sucks"
and many application programmers have never heard of 'fsync' or
barriers and stable storage.

Since you have an 18TB capacity partition it is likely that it is
a big RAID, and that you have a RAID host adapter that has large
RAM buffers, and the disk themselves have them too of course.

> (I have to shut it off) and means I have to wait through a 10
> minute fsck on my next boot =(

That's amazingly quick for a filesystem with 13TB of space used by
4M files (average 3MB, reasonable) in 400K directories (average 1K
files, a bit high), even if there is no repair to be done other
than the log.

But the 10m wait may be the least of your problems -- if bits of
metadata and data have not been committed to stable storage when
the "machine is not properly shutdown", they will have been lost.

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Re: [Jfs-discussion] Fsck on every unclean shutdown

Reply via email to