Hello,

I have been asked to take a look at at poll on a old OSOL 2009.06 host. It have 
been left unattended for a long time and it was found in a FAULTED state. Two 
of the disks in the raildz2 pool seems to have failed, one have been replaced 
by a spare, the other one is UNAVAIL. The machine was restarted and the damaged 
disks was removed to make it possible to access the pool without it hanging on 
I/O-errors.

Now, I have no indication on that more than two disk should have failed,  and 
one of them seems to have been replaced by the spare. I would then have 
expected the pool to be in a working state even with two failed disks and some 
bad data on the remaining disks since metadata has additional replication.

This is the current state of the pool, unable to be imported (at least with 
2009.06):

  pool: tank
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        tank           FAULTED      0     0     1  corrupted data
          raidz2       DEGRADED     0     0     6
            c12t0d0    ONLINE       0     0     0
            c12t1d0    ONLINE       0     0     0
            spare      ONLINE       0     0     0
              c12t2d0  ONLINE       0     0     0
              c12t7d0  ONLINE       0     0     0
            c12t3d0    ONLINE       0     0     0
            c12t4d0    ONLINE       0     0     0
            c12t5d0    ONLINE       0     0     0
            c12t6d0    UNAVAIL      0     0     0  cannot open

If we look at the status it is a mismatch of between the status message that 
states that insufficient replicas are available and the status of the disks. 
More troublesome is the "corrupted data" status for the whole pool. I also get 
"bad config type 16 for stats" from zdb.

What can possible cause something like this, a faulty controller? Is there any 
way to recover (UB rollback with OI perhaps?) The server has ECC memory and 
another pool that is still working fine. The controller is a ARECA 1280.

And some output from zdb:

# zdb tank | more                                               
zdb: can't open tank: I/O error
    version=14
    name='tank'
    state=0
    txg=0
    pool_guid=17315487329998392945
    hostid=8783846
    hostname='storage'
    vdev_tree
        type='root'
        id=0
        guid=17315487329998392945
bad config type 16 for stats
        children[0]
                type='raidz'
                id=0
                guid=14250359679717261360
                nparity=2
                metaslab_array=24
                metaslab_shift=37
                ashift=9
                asize=14002698321920
                is_log=0
root@storage:~# zdb tank                                                      
    version=14
    name='tank'
    state=0
    txg=0
    pool_guid=17315487329998392945
    hostid=8783846
    hostname='storage'
    vdev_tree
        type='root'
        id=0
        guid=17315487329998392945
bad config type 16 for stats
        children[0]
                type='raidz'
                id=0
                guid=14250359679717261360
                nparity=2
                metaslab_array=24
                metaslab_shift=37
                ashift=9
                asize=14002698321920
                is_log=0
bad config type 16 for stats
                children[0]
                        type='disk'
                        id=0
                        guid=5644370057710608379
                        path='/dev/dsk/c12t0d0s0'
                        devid='id1,sd@x001b4d23002bb800/a'
                        
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@0,0:a'
                        whole_disk=1
                        DTL=154
bad config type 16 for stats
                children[1]
                        type='disk'
                        id=1
                        guid=7134885674951774601
                        path='/dev/dsk/c12t1d0s0'
                        devid='id1,sd@x001b4d23002bb810/a'
                        
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@1,0:a'
                        whole_disk=1
                        DTL=153
bad config type 16 for stats
                children[2]
                        type='spare'
                        id=2
                        guid=7434068041432431375
                        whole_disk=0
bad config type 16 for stats
                        children[0]
                                type='disk'
                                id=0
                                guid=5913529661608977121
                                path='/dev/dsk/c12t2d0s0'
                                devid='id1,sd@x001b4d23002bb820/a'
                                
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@2,0:a'
                                whole_disk=1
                                DTL=152
bad config type 16 for stats
                        children[1]
                                type='disk'
                                id=1
                                guid=14421562280953532739
                                path='/dev/dsk/c12t7d0s0'
                                devid='id1,sd@x001b4d23002bb870/a'
                                
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@7,0:a'
                                whole_disk=1
                                DTL=147
bad config type 16 for stats
                children[3]
                        type='disk'
                        id=3
                        guid=15407883879385505475
                        path='/dev/dsk/c12t3d0s0'
                        devid='id1,sd@x001b4d23002bb830/a'
                        
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@3,0:a'
                        whole_disk=1
                        DTL=151
bad config type 16 for stats
                children[4]
                        type='disk'
                        id=4
                        guid=17790008086770830519
                        path='/dev/dsk/c12t4d0s0'
                        devid='id1,sd@x001b4d23002bb840/a'
                        
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@4,0:a'
                        whole_disk=1
                        DTL=150
bad config type 16 for stats
                children[5]
                        type='disk'
                        id=5
                        guid=13234006657214996829
                        path='/dev/dsk/c12t5d0s0'
                        devid='id1,sd@x001b4d23002bb850/a'
                        
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@5,0:a'
                        whole_disk=1
                        DTL=149
bad config type 16 for stats
                children[6]
                        type='disk'
                        id=6
                        guid=5555708758125512539
                        path='/dev/dsk/c12t6d0s0'
                        devid='id1,sd@x001b4d23002bb860/a'
                        
phys_path='/pci@0,0/pci8086,25f8@4/pci8086,370@0/pci17d3,1260@e/disk@6,0:a'
                        whole_disk=1
                        DTL=148
bad config type 16 for stats
zdb: can't open tank: I/O error

Regards
Henrik

http://sparcv9.blogspot.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to