Marijn Stollenga posted on Sat, 04 Aug 2018 12:14:44 +0200 as excerpted: > Hello btrfs experts, I need your help trying to recover an external HDD. > I accidentally created a zfs partition on my external HD, which of > course screw up the whole partition. I quickly unplugged it and it being > a 1TB drive is assume there is still data on it.
Just a user and list regular here, not a dev, so my help will be somewhat limited, but as I've seen no other replies yet, perhaps it's better than nothing... > With some great help on IRC I searched for tags using grep and found > many positions: > https://paste.ee/p/xzL5x > > Now I would like to scan all these positions for their information and > somehow piece it together, I know there is supposed to be a superblock > around 256GB but I'm not sure where the partition started (the search > was run from a manually created partition starting at 1MB). There's a mention of the three superblock copies and their addresses in the problem FAQ (wrapped link due to posting-client required hoops I don't want to jump thru to post it properly ATM, unwrap manually): https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#What_if_I_don. 27t_have_wipefs_at_hand.3F > In general I would be happy if someone can point me to a library that > can do low level reading and piecing together of these pieces of meta > information and see what is left. There are multiple libraries in various states available, but being more a sysadmin than a dev I'd consume them as dependencies of whatever app I was installing that required them, so I've not followed the details. However, here's a bit of what I found just now with a quick look: The project ideas FAQ on the wiki has a (somewhat outdated) library entry (wrapped link...): https://btrfs.wiki.kernel.org/index.php/ Project_ideas#Provide_a_library_covering_.27btrfs.27_functionality That provides further links to a couple python projects as well as a haskell lib. But I added the "somewhat dated" parenthetical due to libtrfsutil by Omar Sandoval, which appeared in btrfs-progs 4.16. So there's now an official library. =:^) Tho not being a dev I've not the foggiest whether it'll provide the functionality you're after. I also see a rust lib mentioned on-list (Oct 2016). https://gitlab.wellbehavedsoftware.com/well-behaved-software/rust-btrfs > I know there is btrfs-check etc. but these need the superblock to be > known. Also on another messed up drive (I screwed up two btrfs drives in > the same way at the same time) I was able to find the third superblock, > but it seems they in the end pointed to other parts in the file system > in the beginning of the drive which were broken. OK, this may seem like rubbing salt in the wound ATM, but there's a reason they did that back in the day before modern disinfectants, it helped stop infection before it started. Likewise, the following policy should help avoid the problem in the first place. A sysadmin's first rule of data value and backups is that the real value placed on data isn't defined by arbitrary claims, but rather by the number and quality of backups those who control that data find it worthwhile to make of it. If it's worth a lot, there will be multiple backups, likely stored in multiple locations, some offsite in ordered to avoid loss in the event of fire/flood/bombing/etc. Only data that's of trivial value, less than that of the time/trouble/resources necessary to do that backup, will have no backup at all. (Of course, age of backups is simply a sub-case of the above, since in that case the data in question is simply the data in the delta between the last backup and the current working state. By definition, as soon as it is considered worth more than the time/trouble/resources necessary to update the backup, an updated or full new backup will be made.) (The second rule of backups is that it's not a backup until it has been tested to actually be usable under conditions similar to those in which the backup would actually be needed. In many cases that'll mean booting to rescue media and ensuring they can access and restore the backup from there using only the resources available from that rescue media. In other cases it'll mean booting directly to the backup and ensuring that normal operations can resume from there. Etc. And if it hasn't been tested yet, it's not a backup, only a potential backup still in progress.) So the above really shouldn't be a problem at all, because you either: 1) Defined the data as worth having a backup, in which case you can just restore from it, OR 2) Defined the data as of such limited value that it wasn't worth the hassle/time/resources necessary for that backup, in which case you saved what was of *real* value, that time/hassle/resources, before you ever lost the data, and the data loss isn't a big deal because it, by definition of not having a backup, can be of only trivial value not worth the hassle. There's no #3. The data was either defined as worth a backup by virtue of having one, and can be restored from there, or it wasn't, but no big deal because the time/trouble/resources that would have otherwise gone into that backup was defined as more important, and was saved before the data was ever lost in the first place. Thus, while the loss of the data due to fat-fingering (which all sysadmins come to appreciate the real risk of, after a few events of their own) the placement of that ZFS might be a bit of a bother, it's not worth spending huge amounts of time trying to recover, because it was either worth having a backup, in which case you simply recover from it, or it wasn't, in which case it's not worth spending huge amounts to time trying to recover, either. Of course there's still the pre-disaster weighed risk that something will go wrong vs. the post-disaster it DID go wrong, now how do I best get back to normal operation question, but in the context of the backups rule above resolving that question is more a matter of whether it's most efficient to spend a little time trying to recover the existing data with no guarantee of full success, or to simply jump directly into the wipe and restore from known-good (because tested!) backups, which might take more time, but has a (near) 100% chance at recovery to the point of the backup. (The slight chance of failure to recover from tested backups is what multiple levels of backups covers for, with the the value of the data and the weighed risk balanced against the value of the time/hassle/ resources necessary to do that one more level of backup.) So while it might be worth a bit of time to quick-test recovery of the damaged data, it very quickly becomes not worth the further hassle, because either the data was already defined as not worth it due to not having a backup, or restoring from that backup will be faster and less hassle, with a far greater chance of success, than diving further into the data recovery morass, with ever more limited chances of success. Live by that sort of policy from now on, and the results of the next failure, whether it be hardware, software, or wetware (another fat- fingering, again, this is coming from someone, me, who has had enough of their own!), won't be anything to write the list about, unless of course it's a btrfs bug and quite apart from worrying about your data, you're just trying to get it fixed so it won't continue to happen. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html