On 12/28/2009 01:47 AM, Peter Grandi wrote:
>> I noticed recently on my system that anytime I have an unclean
>> shutdown on my machine an fsck has to be ran on the volume.
>>     
> That in itself is correct -- jfs_fsck' runs to complete the log
> in JFS, while other file systems have log completion in the kernel.
>
>   
>> I don't see this on my server or the OS partition which is
>> also JFS.
>>     
> This may be because some application is hitting that filesystem
> and has still some file open and updating them throught the
> shutdown, while the other filesystems are less active.
>
>   
>> r...@sabayonx86-64: 11:48 PM :~# df -H /data
>> Filesystem             Size   Used  Avail Use% Mounted on
>> /dev/sdc3               18T    13T   5.4T  71% /data
>>     
>   
>> r...@sabayonx86-64: 11:48 PM :~# df -hi /data
>> Filesystem            Inodes   IUsed   IFree IUse% Mounted on
>> /dev/sdc3               4.0G    6.0M    4.0G    1% /data
>>     
> That's fairly large for a single filesystem, and implies RAID; no
> idea of what it is used for, and in particular how actively and
> fast it is written to.
>
>   
>> fsck.jfs version 1.1.11, 05-Jun-2006
>> processing started: 3/8/2009 3.30.9
>> The current device is:  /dev/sdc3
>> Block size in bytes:  4096
>> Filesystem size in blocks:  4360350561
>> **Phase 0 - Replay Journal Log
>> logredo failed (rc=-231).  fsck continuing.
>>     
> That is worrying, it is a failure to complete the log.
>
>   
>> [ ... ]
>> **Phase 9 - Reformat File System Log
>>     
> And that recreates the log.
>
>   
>> I get an fsck on this partition basically everytime my machine
>> is not shut down properly everytime my machine is not shut down
>> properly
>>     
> That is indeed expected -- as mentioned above 'jfs_fsck' is run to
> complete the log in those cases. The worrying bit above is that it
> fails to do so because the log has been corrupted.
>
> >From you report it appears that if you shutdown the system
> properly everything is fine, and the log is nor corrupted; please
> double check by doing a proper shutdown and then running 'umount
> /dev/sdc3; jfs_fsck -f /dev/sdc3' to make sure.
>
> What I suspect is that either you using the known-buggy version of
> JFS (unlikely) or rather more likely your filesystem is quite
> active and you haven's considered the issue of stable storage and
> barriers, and there is in-flight log data when your "machine is
> not shut down properly", and is lost.
>
> Of course if there is some in-flight metadata like log entries at
> that time very bad news will happen; and given that there is
> likely to be in-flight data too, your data is highly suspect
> too. As per a recent discussion about 'ext4', "user space sucks"
> and many application programmers have never heard of 'fsync' or
> barriers and stable storage.
>
> Since you have an 18TB capacity partition it is likely that it is
> a big RAID, and that you have a RAID host adapter that has large
> RAM buffers, and the disk themselves have them too of course.
>
>   
>> (I have to shut it off) and means I have to wait through a 10
>> minute fsck on my next boot =(
>>     
> That's amazingly quick for a filesystem with 13TB of space used by
> 4M files (average 3MB, reasonable) in 400K directories (average 1K
> files, a bit high), even if there is no repair to be done other
> than the log.
>
> But the 10m wait may be the least of your problems -- if bits of
> metadata and data have not been committed to stable storage when
> the "machine is not properly shutdown", they will have been lost.
>
>   

What I meant was that a logredo fails and thus a full fsck has to be ran
(not a quick replay of the journal). I think this is a bug somewhere
going by the other thread that Tim Nufire started. This is my home
system so it actually isn't under heavy writes that often and most of
the time I run into this is my dumb init scripts which stop networking
before unmounting network file-systems and thus the machine hangs at
shutdown (and i have to power off) so there really should be no write
activity going on.

This was on a raid controller with 2 GB of cache but it does have a BBU.

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to