Hello,
I reported recently on this list a problem with a JFS filesystem going
read-only. It happened again, and this time I have more details.
It all started with this message in the kernel log:
Sep 15 17:39:40 elmer kernel: ERROR: (device dm-0): XT_GETPAGE: xtree
page corrupt
At that point, the filesystem in question apparently turned read-only.
This morning, when I found it in that state, I ran fsck -f on it. This
resulted in:
elmer:/dev/MASS1# fsck -f /dev/MASS1/vsmain
fsck 1.37 (21-Mar-2005)
fsck.jfs version 1.1.8, 03-May-2005
processing started: 9/16/2005 7.9.22
The current device is: /dev/MASS1/vsmain
Block size in bytes: 4096
Filesystem size in blocks: 28049408
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
File system object FF1011719 is linked as: /modpython_internal/var/log/syslog
cannot repair the data format error(s) in this file.
cannot repair FF1011719. Will release.
File system object FF1511721 is linked as: /modpython_internal/var/log/mail.log
cannot repair the data format error(s) in this file.
cannot repair FF1511721. Will release.
File system object FF1511734 is linked as: /modpython_internal/var/log/mail.info
cannot repair the data format error(s) in this file.
cannot repair FF1511734. Will release.
File system object FF1568824 is linked as:
/database_internal/var/lib/postgres/data/base/17142/1223159869
cannot repair the data format error(s) in this file.
cannot repair FF1568824. Will release.
File system object FF1782520 is linked as: /modpython_internal/var/log/mail.warn
cannot repair the data format error(s) in this file.
cannot repair FF1782520. Will release.
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
112197632 kilobytes total disk space.
283070 kilobytes in 49477 directories.
29218274 kilobytes in 624332 user files.
0 kilobytes in extended attributes
738692 kilobytes reserved for system use.
82523736 kilobytes are available for use.
Filesystem is clean.
elmer:/dev/MASS1#
I then rebooted the machine.
When it came back up, the kernel oopsed. I thought I grabbed a log from
it, but alas I didn't. However, this was the first message in the oops
log:
BUG at fs/jfs/jfs_dmap.c:2722 assert(bsz < le32_to_cpu(tp->dmt_nleafs))
The damage was pretty serious (very little on the system was working) so I
rebooted again.
This time, the kernel spewed out these messages:
ERROR: (device dm-0): diUpdatePMap: inode 81995 not marked as allocated in wmap!
ERROR: (device dm-0): diUpdatePMap: inode 81995 not marked as allocated in pmap!
ERROR: (device dm-0): diFree: wmap shows inode already free
ERROR: (device dm-0): diUpdatePMap: inode 81996 not marked as allocated in wmap!
ERROR: (device dm-0): diUpdatePMap: inode 81996 not marked as allocated in pmap!
ERROR: (device dm-0): diFree: wmap shows inode already free
ERROR: (device dm-0): diUpdatePMap: inode 81998 not marked as allocated in wmap!
ERROR: (device dm-0): diUpdatePMap: inode 81998 not marked as allocated in pmap!
ERROR: (device dm-0): diFree: wmap shows inode already free
The filesystem again turned ro. I ran fsck again (without -f this time,
but it seemed to do a full scan anyway). This time, fsck reported:
elmer:~# fsck.jfs /dev/MASS1/vsmain
fsck.jfs version 1.1.8, 03-May-2005
processing started: 9/16/2005 7.43.39
Using default parameter: -p
The current device is: /dev/MASS1/vsmain
Block size in bytes: 4096
Filesystem size in blocks: 28049408
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and Directory Entries
**Phase 2 - Count links
Incorrect link counts have been detected. Will correct.
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
Directory entries for unallocated files have been detected. Will
remove.
**Phase 4 - Report Problems
File system object FF81988 is linked as:
/chatterbox_internal/var/run/atd.pid
The path(s) refer to an unallocated file. Will remove.
File system object DF230944 is linked as: /chatterbox_internal/var/run
File system object DF251840 is linked as: /gatekeeper_external/var/run
Errors detected in Directory Index Table. Will Fix.
File system object FF251843 is linked as:
/gatekeeper_external/var/run/utmp
The path(s) refer to an unallocated file. Will remove.
File system object FF442839 is linked as:
/chatterbox_internal/var/run/syslogd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF443166 is linked as:
/chatterbox_internal/var/run/utmp
The path(s) refer to an unallocated file. Will remove.
File system object FF443167 is linked as:
/chatterbox_internal/var/run/klogd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF675840 is linked as:
/chatterbox_internal/var/run/crond.reboot
The path(s) refer to an unallocated file. Will remove.
File system object DF708736 is linked as: /directory_internal/var/run
Errors detected in Directory Index Table. Will Fix.
File system object FF708739 is linked as:
/directory_internal/var/run/utmp
The path(s) refer to an unallocated file. Will remove.
File system object FF721832 is linked as: /gatekeeper_external/dev/log
The path(s) refer to an unallocated file. Will remove.
File system object FF929792 is linked as:
/chatterbox_internal/var/run/crond.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF1060879 is linked as:
/chatterbox_internal/var/run/sessiondb.dir
The path(s) refer to an unallocated file. Will remove.
File system object FF1331209 is linked as:
/chatterbox_internal/var/run/sshd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF1497743 is linked as:
/chatterbox_internal/var/run/sessiondb.pag
The path(s) refer to an unallocated file. Will remove.
File system object FF1527820 is linked as:
/chatterbox_internal/var/run/spamd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF1769516 is linked as:
/gatekeeper_external/var/run/klogd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF1769597 is linked as: /directory_internal/dev/log
The path(s) refer to an unallocated file. Will remove.
File system object FF1769598 is linked as:
/gatekeeper_external/var/run/inetd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF1769599 is linked as:
/directory_internal/var/run/klogd.pid
The path(s) refer to an unallocated file. Will remove.
File system object FF1769615 is linked as:
/gatekeeper_external/var/spool/postfix/etc/services
The path(s) refer to an unallocated file. Will remove.
File system object FF1769818 is linked as:
/gatekeeper_external/var/spool/postfix/lib/libnss_dns-2.3.2.so
The path(s) refer to an unallocated file. Will remove.
File system object FF1769819 is linked as:
/gatekeeper_external/var/spool/postfix/lib/libnss_dns.so.2
The path(s) refer to an unallocated file. Will remove.
File system object FF1769820 is linked as:
/gatekeeper_external/var/spool/postfix/lib/libnss_files-2.3.2.so
The path(s) refer to an unallocated file. Will remove.
File system object FF1769822 is linked as:
/gatekeeper_external/var/spool/postfix/lib/libnss_files.so.2
The path(s) refer to an unallocated file. Will remove.
File system object FF1769828 is linked as:
/gatekeeper_external/var/spool/postfix/lib/libnss_hesiod.so.2
The path(s) refer to an unallocated file. Will remove.
File system object FF1769829 is linked as:
/gatekeeper_external/var/spool/postfix/lib/libnss_nis-2.3.2.so
The path(s) refer to an unallocated file. Will remove.
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
8 files reconnected to /lost+found/.
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
112197632 kilobytes total disk space.
283063 kilobytes in 49478 directories.
29498035 kilobytes in 624286 user files.
0 kilobytes in extended attributes
737680 kilobytes reserved for system use.
82244980 kilobytes are available for use.
Filesystem is clean.
This is jfsutils 1.1.8, kernel.org 2.6.12.4.
I'm seriously considering switching this machine to ext3 now.... this
sort of thing is just not good for a production server.
Any ideas?
Thanks,
-- John
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion