http://lists.acm.jhu.edu/pipermail/acm/2007-November/006831.html[ACM] **SPAM** Re: ext2/ext3 on-disk formatPeter Froehlich phf at cs.jhu.eduWed Nov 28 11:50:38 EST 2007
Hi all, Just on the general topic of recovering emails from "trashed" disks, has it occurred to you that there might be serious investigative applications for a product that does exactly what Asheesh wants? With all the emails getting lost in the White House and all... :-) Seriously, maybe there's even a quick conference paper there, I'd recommend checking in with Randal and his group. Cheers, Peter On Nov 28, 2007, at 6:44 AM, Antonello Cruz wrote: > Asheesh, > > ext2/ext3 data should be aligned by block (usually it is 2K but can > be 4K) > http://en.wikipedia.org/wiki/Ext2 > > Finding the end of a file that is longer than one block is tricky > since > the blocks storing that file is not a linked list. It is a sort of > tree > rooted on the inode (see the wiki page). I am not sure how long your > emails generally are, but if they are shorter than a block, your > approach > for finding the beginning of the message should work. > > Another approach, more cumbersome though, is finding the beginning > of each > message which will tell you the first block of the file you want to > recover. Then you can go the the blocks that are supposed to heve the > inodes (you'll need to figure out how ext2/3 is laid out at the > beginning > of the disk) and find the inode corresponding to that file. There > can be > more than one for two reasons. First, it may be a deleted inode from a > file previously stored at the same block, or it may be a hard link > to the > same file. > > Keep in mind that I am not an ext2/3 expert or a storage system > expert. > > Good luck, > > Antonello > > --- Asheesh Laroia <acm at jhu.asheesh.org> wrote: > >> A few months back, I suffered some major data loss on some hard >> drives. >> (Lesson learned: RAID is not backup.) I had a partial backup of my >> emails >> that were stored on those drives, but a couple of days before the >> main >> drives failed I rm -rf'd the backup. The partial backup was >> stored on >> ext3. >> >> Then the main drives failed, so I saved a disk image of the drive >> where >> the partial backup was rm'd. >> >> So today I'm looking at that saved disk image in a hex editor. I >> don't >> need filenames, and I can identify the sorts of files I want: I want >> email >> files (messages, one per file, in Maildirs), and they're really >> easy to >> detect: They start with a mail header, which looks something like >> "Date: >> >> Tue, 16 Sep".... >> >> But what I do need is a reliable way to detect file boundaries in >> ext3, >> preferably a way that works for deleted files also. >> >> For file starts - Do they always start at offsets that fit a pattern, >> like >> (offset % 2048) == 0? Then I can only start looking for email >> headers >> at >> those positions. >> >> For file ends - Is there file-end zero padding until some block >> width, >> like "after the file the rest of the 4096-size block is padding with >> zeroes"? Then I use that to detect that I have the whole message >> file. >> >> The filesystem where the deletes happened can be inspected with >> things >> like debugfs or tune2fs. Assume I don't know anything about >> filesystems >> >> but in general am a reasonable fellow who will try to understand what >> you >> teach him. >> >> I'd dearly appreciate help, for example from people who took Storage >> Systems. If you only know about ext2, tell me anyway - ext3 is quite >> similar! >> >> -- Asheesh. >> >> -- >> I finally went to the eye doctor. I got contacts. I only need >> them to >> read, so I got flip-ups. >> -- Steven Wright >> _______________________________________________ >> ACM mailing list >> ACM at acm.jhu.edu >> http://lists.acm.jhu.edu/mailman/listinfo/acm >> > > > > > ______________________________________________________________________ > ______________ > Be a better pen pal. > Text or chat with friends inside Yahoo! Mail. See how. http:// > overview.mail.yahoo.com/ > _______________________________________________ > ACM mailing list > ACM at acm.jhu.edu > http://lists.acm.jhu.edu/mailman/listinfo/acm > -- Peter H. Froehlich <><><><><><> http://www.cs.jhu.edu/~phf/ OpenPGP: ABC2 9BCC 1445 86E9 4D59 F532 A8B2 BFAE 342B E9D9 |
