Re: [Jfs-discussion] everything at the beginning of the jfs partition is lost. recovery thinkable?

Dave Kleikamp Wed, 07 Sep 2005 20:36:30 -0700

On Thu, 2005-09-08 at 04:47 +0200, Andreas Engelbert wrote:
> Dave Kleikamp wrote:
> 
> > This is plausible.  Inodes are allocated in 16K blocks of 32 inodes, so
> > if find 32 consecutive ino's in the right position, that would be at
> > least part of a decent sanity check on whether you found an inode
> > extent.
> 
> Excelent.
> 
> >>So we end up with a lot cut-off subtrees. Next task is to find the root
> >>of each subtree and to join them in a recreated global fileset-root.
> >>With everything else rebuild like a mkjfs would have done.
> > 
> > 
> > jfs_fsck does something like this to verify that the directory tree is
> > sane.  jfs_mkfs does not deal with it since it only needs to create an
> > empty root directory.
> > 
> 
> (I let jfs_mkfs rebuild the basic structures for the empty filesystem.
> Before, jfs_debugfs was not workable.)


Okay, makes sense.

> Do you say that, if the apropriate tables point to all the discovered
> ino's, then the recovery of a sane and complete tree is just done by a
> fsck run? cool! Well, I almost forgot the mapping of the allocated and
> the free blocks.

Hmm.  You would have to build the iag's correctly so that they point to
the discovered inode extents, and the imap inode would have to be
created such that the data of the file (aggregate inode 16) is the iags
(and they have to be at the right logical offset).  If this is done
right, then fsck may be able to fix everything else.  It normally
rebuilds the blocks maps completely, so you shouldn't have to worry
about that.

> >>Is there a way to automate jfs_debug and do the necessary steps with it?
> > 
> > 
> > What you propose sounds reasonable, but as far as I know, such a tool
> > doesn't exist.  There is probably code in jfs_debugfs that you could
> > use.  I'm not sure about using jfs_debugfs in an automated way.  Another
> > example of some simple code to read the jfs data structures is in the
> > grub source.
> 
> With pipelining and Pearl, one can automate jfs_debugfs to simple batch
> execution like this
> 
> #jfs_debug_batch.pl | jfs_debugfs /dev/loop2 | jfs_debug_parse.pl \
>       >/tmp/debugfs.out
> 
> Where the first .pl just prints the commands and the second one parses
> the output.

Okay, you are more comfortable with perl.  I'd just write in c
myself.  :-)

> Now the plausiblity check of ino's could be performed, if I knew the
> variables better. The display inode command on a arbitrarry block
> results something like the following:
> 
> (Sorry for the large printout. I dont expect an explanation on
> everything, but maybe you got some hints for variable bounds on hand.)
> 
> [0x3e8000]    //Real Address
>               //the blocksize is 4k, so I probe every block with
>               //[0-3]*512 offset. Im pretty sure, the ino-extends
>               //are alligned to those 4k-blocks
> 
> di_inostamp:0x00002000        //stamp to show inode belongs to fileset
>                       //? perhaps a magic number

If I remember correctly, it's a copy of the inostamp of aggregate inode
1.  It's kind of a sanity check so that if you reformat the partition,
you don't recoginize old inodes as valid.  Of course, you want the
opposite behavior.

> di_fileset:8192       //fileset number
>                       //? only one fileset, so it should be 0

Should be 16.
> 
> di_number:8028160     //inode number, aka file serial number
>                       //? less than blockcount, not a strong test

The strong test is that the inode numbers of 32 consecutive inodes
should be consecutive.

> di_gen:0              //inode generation number
>                       //? what could that be

This is incremented when an inode is deleted & recreated.  It's used by
nfs to revalidate an inode to see if it is the same entity it expects.

> di_ixpxd.len:256      //inode extent descriptor
> di_ixpxd.addr1:0x00   
> di_ixpxd.addr2:0x00000008     
> di_ixpxd.address:8    

len should always be 4
address should be block number of extent

> di_size:0x0000000400000055    //size
>                               //?seems to be the filesize
yes
> 
> di_nblocks:0xff0dffffff0d0d05 //number of blocks allocated
>                               //?should roughly compare to size/4k
yes, but not always.  The file could be sparse.  It also accounts for
xattrs and extra blocks used by the xtree

> di_nlink:-1           //number of links to the object
>                       //?is -1 valid, maybe not too large either
Probably should be treated as unsigned.  0 means it's free.
> 
> di_uid:-1             //user id of owner
> di_gid:-1             //group id of owner
>                       //!1000..1100
> 
> di_mode:0xff0dffff    //attribute, format and permission
> 
> di_atime.tv_sec:0xffffffff    //time last data accessed
> di_ctime.tv_sec:0xffffffff    //time last status changed
> di_mtime.tv_sec:0xffffffff    //time last data modified
> di_otime.tv_sec:0xffffffff    //time created
>                               //? sec since 01.01.1970
> 
> di_acl.flag:0xff              //16: acl descriptor
> di_acl.size:0xffffffff
> di_acl.len:16777215
> di_acl.addr1:0xff
> di_acl.addr2:0xffffffff
> di_acl.address:1099511627775

not used unless the filesystem was populated in OS/2

> di_ea.flag:0xff                       //16: ea descriptor
> di_ea.size:0xffffffff         
> di_ea.len:16777215
> di_ea.addr1:0xff
> di_ea.addr2:0xff0dffff
> di_ea.address:1099495768063
> di_next_index:-1              //Next available dir_table index
> di_acltype:0xffffffff         //Type of ACL
>                               //? I have not the best idea
Meaningless unless filesystem came from OS/2
> 
> 
> 
> > It would be nice to have a much more aggressive fsck that tried to
> > rebuild anything that has been destroyed and tried to salvage anything
> > recoverable, but that's not how the current jfs_fsck was designed, and I
> > don't see it happening.  Backups are still the best way to plan for
> > catastrophic file system damage.
> 
> Absolutly. I was just preparing the backup, as I formated the wrong
> raid. Next time I'll try to backup the jfs metadata also.
> 
> For the aggressive fsck, reiser seems to have problems with fs-images of
> its own kind. It finds more metadata than healthy for one fs and
> corrupts the tree. Better to hide such operations behind a
> --really-force flag.

exactly, if it ever happens, it shouldn't be the default behavior.

> > I don't have the time to really provide any real help with this, but I
> > don't mind answering questions.
> 
> Thank you very much for your helping answers.

No problem.  I'm glad you're willing to work on this.  It may prove to
be something that can other find helpful.

-- 
David Kleikamp
IBM Linux Technology Center



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Re: [Jfs-discussion] everything at the beginning of the jfs partition is lost. recovery thinkable?

Reply via email to