Eric Schrock wrote:
> On Fri, Oct 10, 2008 at 06:15:16AM -0700, Marcelo Leal wrote:
>>  - "ZFS does not need fsck".
>>  Ok, that?s a great statement, but i think ZFS needs one. Really does.
>>  And in my opinion a enhanced zdb would be the solution. Flexibility.
>>  Options.
> 
> About 99% of the problems reported as "I need ZFS fsck" can be summed up
> by two ZFS bugs:
> 
> 1. If a toplevel vdev fails to open, we should be able to pull
>    information from necessary ditto blocks to open the pool and make
>    what progress we can.  Right now, the root vdev code assumes "can't
>    open = faulted pool," which results in failure scenarios that are
>    perfectly recoverable most of the time.  This needs to be fixed
>    so that pool failure is only determined by the ability to read
>    critical metadata (such as the root of the DSL).
> 
> 2. If an uberblock ends up with an inconsistent view of the world (due
>    to failure of DKIOCFLUSHWRITECACHE, for example), we should be able
>    to go back to previous uberblocks to find a good view of our pool.
>    This is the failure mode described by Jeff.

I've mostly seen (2), because despite all the best practices out there, 
single vdev pools are quite common. In all such cases that I had my 
hands on it was possible to recover pool by going back by one or two txgs.

> These are both bugs in ZFS and will be fixed.  The other 1% of the
> complaints are usually of the form "I created my pool on top of my old
> one" or "I imported a LUN on two different systems at the same time".

Of these two former is not easy because it requires searching through 
the entire disk space for root block candidates and trying each of them.
Latter one is not catastrophic in case there were little to no activity 
from one system. In this case one of the first things to suffer is pool 
config object, and corruption of it prevents pool open.

Fortunately enough, after putback of

6733970 assertion failure in dbuf_dirty() via spa_sync_nvlist()

in build 99 corrupted pool config object is written in such a way during 
open that prevents reading in old corrupted copy, and in most cases this 
allows to import pool and save most of the data. zdb is useful to 
understand how much is corrupted and how much is recovered. If nothing 
else is corrupted, then pool may be available for further use without 
recreation. Again, in every case I had my hands on it was possible to 
either recover pool completely or at least save most of the data.

> It's unclear what a 'fsck' tool could do in this scenario, if anything.
> Due to a variety of reasons (hierarchical nature of ZFS, variable block
> sizes, RAIDZ-Z, compression, etc), it's difficult to even *identify* a
> ZFS block, let alone determine its validity and associate it in some
> larger construct.

Indeed. In "more ZFS recovery" case involving 42TB pool with about 8TB 
used, zdb -bv alone took several hours to walk the block tree and verify 
consistency of block pointers, and zdb -bcv took couple of days to 
verify all user data blocks as well. And different checksums and gang 
blocks in addition to all other dynamic features mentioned complicate 
the task of identifying ZFS blocks and linking those blocks into tree 
and make it really time (and space) consuming.

> There are some interesting possibilities for limited forensic tools - in
> particular, I like the idea of a mdb backend for reading and writing ZFS
> pools[1].  But I haven't actually heard a reasonable proposal for what a
> fsck-like tool (i.e. one that could "repair" things automatically) would
> actually *do*, let alone how it would work in the variety of situations
> it needs to (compressed RAID-Z?) where the standard ZFS infrastructure
> fails.

There are a number of bugs and rfes to improve usefulness of zdb for 
field use, e.g.

6720637 want zdb -l option to dump uberblock arrays as well
6709782 issues running zdb with -p and -e options
6736356 zdb -R needs to work with exported pools
6720907 zdb should handle errors while dumping datasets and objects
6746101 zdb command to search for ZFS labels in a device
6757444 want zdb -R to supoprt decompression, checksumming and raid-z
6757430 want an option for zdb to disable space map loading and leak 
tracking

Hth,
Victor

> - Eric
> 
> [1] 
> http://mbruning.blogspot.com/2008/08/recovering-removed-file-on-zfs-disk.html
> 
> --
> Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to