Asheesh,
ext2/ext3 data should be aligned by block (usually it is 2K but can be 4K)
http://en.wikipedia.org/wiki/Ext2
Finding the end of a file that is longer than one block is tricky since
the blocks storing that file is not a linked list. It is a sort of tree
rooted on the inode (see the wiki page). I am not sure how long your
emails generally are, but if they are shorter than a block, your approach
for finding the beginning of the message should work.
Another approach, more cumbersome though, is finding the beginning of each
message which will tell you the first block of the file you want to
recover. Then you can go the the blocks that are supposed to heve the
inodes (you'll need to figure out how ext2/3 is laid out at the beginning
of the disk) and find the inode corresponding to that file. There can be
more than one for two reasons. First, it may be a deleted inode from a
file previously stored at the same block, or it may be a hard link to the
same file.
Keep in mind that I am not an ext2/3 expert or a storage system expert.
Good luck,
Antonello
--- Asheesh Laroia <acm at jhu.asheesh.org> wrote:
A few months back, I suffered some major data loss on some hard drives.
(Lesson learned: RAID is not backup.) I had a partial backup of my
emailsthat were stored on those drives, but a couple of days before the
maindrives failed I rm -rf'd the backup. The partial backup was stored
onext3.
Then the main drives failed, so I saved a disk image of the drive where
the partial backup was rm'd.
So today I'm looking at that saved disk image in a hex editor. I don't
need filenames, and I can identify the sorts of files I want: I want
emailfiles (messages, one per file, in Maildirs), and they're really
easy todetect: They start with a mail header, which looks something like
"Date:
Tue, 16 Sep"....
But what I do need is a reliable way to detect file boundaries in ext3,
preferably a way that works for deleted files also.
For file starts - Do they always start at offsets that fit a pattern,
like(offset % 2048) == 0? Then I can only start looking for email
headers
atthose positions.
For file ends - Is there file-end zero padding until some block width,
like "after the file the rest of the 4096-size block is padding with
zeroes"? Then I use that to detect that I have the whole message file.
The filesystem where the deletes happened can be inspected with things
like debugfs or tune2fs. Assume I don't know anything about filesystems
but in general am a reasonable fellow who will try to understand what
youteach him.
I'd dearly appreciate help, for example from people who took Storage
Systems. If you only know about ext2, tell me anyway - ext3 is quite
similar!
-- Asheesh.
--
I finally went to the eye doctor. I got contacts. I only need them to
read, so I got flip-ups.
-- Steven Wright
_______________________________________________
ACM mailing list
ACM at acm.jhu.edu
http://lists.acm.jhu.edu/mailman/listinfo/acm