Re: [Jfs-discussion] Bad Superblock on raid0 jfs disks after power failure

Dave Kleikamp Wed, 17 Aug 2005 06:23:19 -0700

On Tue, 2005-08-16 at 09:37 +0200, Simon Hoerder wrote:
> Hi,
> 
> after a power failure, all superblocks of my raid0 disks (all of them
> formatted with jfs) seem to be scrap. Those partitions, that were not in
> the raid array are just fine.
> 
> I have put two equal partitions (same size, same fs, same position on the
> hard disc) into each of my 4 raid0 arrays. All those disks are encrypted
> with cryptoloop. Cryptoloop works just fine and raid doesn't complain
> either but as soon as I want to mount the raid arrays, I get a:
> 'mount: wrong fs type, bad option, bad superblock on /dev/loop0,
>         or too many mounted file systems'
> 
> My next step was to mount the single partitions without raid. It failed
> with the same error message. (What a surprise for raid0.) After that I
> only worked with the single partitions to avoid raid problems. (And since
> I'm anything but an expert, it feels save to have a backup handy, even if
> it's a broken one.)
> 
> I ran fsck (Standard and 'jfs_fsck -n') on the disks. It always reported
> that no valid jfs superblock could be found. After that I started
> jfs_debugfs to have look on the superblocks. Apparently, field [1] had
> been scrambled and I resetted it to 'JFS1'. (I copied that value from the
> superblock of one of the working jfs partitions.)


Very odd.  I don't know what would scramble a single 4-byte value and
leave everything else alone.

> After that, 'fsck.jfs -v' delivered the following for all (important)
> partitions:
> fsck.jfs version 1.1.7, 22-Jul-2004
> processing started: 8/15/2005 21.46.22
> Using default parameter: -p
> The current device is:  /dev/loop5
> Open(...READ/WRITE EXCLUSIVE...) returned rc = 0
> Primary superblock is valid.
> The type of file system for the device is JFS.
> Block size in bytes:  4096
> Filesystem size in blocks:  10486420
> **Phase 0 - Replay Journal Log
> LOGREDO:  Log superblock contains invalid magic number.
> logredo failed (rc=-268).  fsck continuing.
> **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
> Invalid stamp detected in file system object MA2.
> Primary metadata inode A2 is corrupt.
> Invalid stamp detected in file system object MA2.
> Secondary metadata inode A2 is corrupt.
> Errors detected in the Primary File/Directory Allocation Table.
> Errors detected in the Secondary File/Directory Allocation Table.
> CANNOT CONTINUE.
> processing terminated:  8/15/2005 21:46:22  with return code: -10049  exit
> code: 4.

Doesn't look good.  :-( I think more than just the s_magic field got
corrupted.

> (On one partition fsck.jfs reported a corrupted superblock but since this
> partition has almost no important data on it, I don't care too much about
> it.)
> 
> jfs_debugfs now shows the superblock of the other three partions (loop4,
> loop6 and loop7 attachments; got them by stdout-redirection: first 'su p',
> then 'su s') like this of loop 4:
> fs_debugfs version 1.1.7, 22-Jul-2004
> 
> Aggregate Block Size: 4096
> 
> primary superblock:
> [1] s_magic:          'JFS1'          [15] s_ait2.addr1:      0x00
> [2] s_version:                1               [16] s_ait2.addr2:      
> 0x00000518
> [3] s_size:   0x0000000004ff0908           s_ait2.address:    1304
> [4] s_bsize:          4096            [17] s_logdev:          0x00000700
> [5] s_l2bsize:                12              [18] s_logserial:       
> 0x00000016
> [6] s_l2bfactor:      3               [19] s_logpxd.len:      8192
> [7] s_pbsize:         512             [20] s_logpxd.addr1:    0x00
> [8] s_l2pbsize:               9               [21] s_logpxd.addr2:    
> 0x009fe294
> [9] pad:              Not Displayed        s_logpxd.address:  10478228
> [10] s_agsize:                0x00020000      [22] s_fsckpxd.len:     371
> [11] s_flag:          0x10200900      [23] s_fsckpxd.addr1:   0x00
>                       JFS_LINUX       [24] s_fsckpxd.addr2:   0x009fe121
>       JFS_COMMIT      JFS_GROUPCOMMIT      s_fsckpxd.address: 10477857
>                       JFS_INLINELOG   [25] s_time.tv_sec:     0x41c555ad
>                                       [26] s_time.tv_nsec:    0x00000000
>                                       [27] s_fpack:           ''
> [12] s_state:         0x00000001
>            FM_MOUNT
> [13] s_compress:      0
> [14] s_ait2.len:      4

Looks sane.  s_state is FM_MOUNT, so jfs will not allow a read-write
mount until the journal has been replayed by fsck.

> secondary superblock
> [1] s_magic:          'JFS1'          [15] s_ait2.addr1:      0x00
> [2] s_version:                1               [16] s_ait2.addr2:      
> 0x00000518
> [3] s_size:   0x0000000004ff0908           s_ait2.address:    1304
> [4] s_bsize:          4096            [17] s_logdev:          0x00000700
> [5] s_l2bsize:                12              [18] s_logserial:       
> 0x00000014
> [6] s_l2bfactor:      3               [19] s_logpxd.len:      8192
> [7] s_pbsize:         512             [20] s_logpxd.addr1:    0x00
> [8] s_l2pbsize:               9               [21] s_logpxd.addr2:    
> 0x009fe294
> [9] pad:              Not Displayed        s_logpxd.address:  10478228
> [10] s_agsize:                0x00020000      [22] s_fsckpxd.len:     371
> [11] s_flag:          0x10200900      [23] s_fsckpxd.addr1:   0x00
>                       JFS_LINUX       [24] s_fsckpxd.addr2:   0x009fe121
>       JFS_COMMIT      JFS_GROUPCOMMIT      s_fsckpxd.address: 10477857
>                       JFS_INLINELOG   [25] s_time.tv_sec:     0x41c555ad
>                                       [26] s_time.tv_nsec:    0x00000000
>                                       [27] s_fpack:           ''
> [12] s_state:         0x00000000
>            FM_CLEAN
> [13] s_compress:      0
> [14] s_ait2.len:      4
> 
> I have attached the output of a well working partition (hda6) for comparison.
> 
> I did google for similar problems but found nothing really helpfull.
> 'od -x -v -N 64 /dev/loop4 +0x1000' just get's me a lot of zeros:
> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> 0000020 0000 0000 0000 0000 0000 0000 0000 0000
> 0000040 0000 0000 0000 0000 0000 0000 0000 0000
> 0000060 0000 0000 0000 0000 0000 0000 0000 0000
> 0000100
> I copied it from a mailinglist post and guess that the offset is wrong
> (same output for the good partition), but I don't know, what the right
> offset would be.

I found the post you were referring to.  It dealt with AIX's jfs, which
is completely different.

> My Box is a SuSE 9.2 and I'm working with their standard JFS filesystem. I
> don't know, if they are using any special flavor. It's an AMD Duron
> processor (x386). The raid0 is linux software raid. And I'm anything but
> an expert, as I've mentioned before.
> 
> Is there a real chance of repairing those superblocks and where can I find
> hints how to do it? I've googled and found the information that carried me
> through the examinations that I've described above but I haven't found
> anything to go on from here.

One thing you might try is to mount the volumes read-only (mount -oro).
Sometimes the file system is intact enough to read, but broken enough
that fsck isn't happy.

> Thanks for your consideration,
> Simon Hoerder
> 
> P.S. I know I'm paying the price for beeing to lazy for additional backup
> copies.
-- 
David Kleikamp
IBM Linux Technology Center



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Re: [Jfs-discussion] Bad Superblock on raid0 jfs disks after power failure

Reply via email to