On Tue, Jul 7, 2009 at 12:56 PM, Mahlon E. Smith <mah...@martini.nu> wrote:

> I've got a 9 sata drive raidz1 array, started at version 6, upgraded to
> version 13.  I had an apparent drive failure, and then at some point, a
> kernel panic (unrelated to ZFS.)  The reboot caused the device numbers
> to shuffle, so I did an 'export/import' to re-read the metadata and get
> the array back up.

This is why we've started using glabel(8) to label our drives, and then add
the labels to the pool:
  # zpool create store raidz1 label/disk01 label/disk02 label/disk03

That way, it does matter where the kernel detects the drives or what the
physical device node is called, GEOM picks up the label, and ZFS uses the

> Once I swapped drives, I issued a 'zpool replace'.

See comment at the end:  what's the replace command that you used?

> That was 4 days ago now.  The progress in a 'zpool status' looks like
> this, as of right now:
>  scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
> ... which is a little concerning, since a) it appears to have not moved
> since I started it, and b) I'm in a DEGRADED state until it finishes...
> if it finishes.

There's something wrong here.  It definitely should be incrementing.  Even
when we did the foolish thing of creating a 24-drive raidz2 vdev and had to
replace a drive, the progress bar did change.  Never got above 39% as it
kept restarting, but it did increment.

> So, I reach out to the list!
>  - Is the resilver progress notification in a known weird state under
>   FreeBSD?
>  - Anything I can do to kick this in the pants?  Tuning params?

I'd redo the replace command, and check the output of "zpool status" to make
sure it's showing the proper device node and not some random string of
numbers like it is.

>  - This was my first drive failure under ZFS -- anything I should have
>   done differently?  Such as NOT doing the export/import? (Not sure
>   what else I could have done there.)

If you knew which drive it was, I'd have shutdown the server and replaced
it, so that the drives came back up renumbered correctly.

This happened to us once when I was playing around with simulating dead
drives (pulling drives) and rebooting.  That's when I moved over to using

% zpool status store
>  pool: store
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>        continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
> config:
>        NAME                       STATE     READ WRITE CKSUM
>        store                      DEGRADED     0     0     0
>          raidz1                   DEGRADED     0     0     0
>            da0                    ONLINE       0     0     0  274K
> resilvered
>            da1                    ONLINE       0     0     0  282K
> resilvered
>            replacing              DEGRADED     0     0     0
>              2025342973333799752  UNAVAIL      3 4.11K     0  was /dev/da2
>              da8                  ONLINE       0     0     0  418K
> resilvered
>            da2                    ONLINE       0     0     0  280K
> resilvered
>            da3                    ONLINE       0     0     0  269K
> resilvered
>            da4                    ONLINE       0     0     0  266K
> resilvered
>            da5                    ONLINE       0     0     0  270K
> resilvered
>            da6                    ONLINE       0     0     0  270K
> resilvered
>            da7                    ONLINE       0     0     0  267K
> resilvered
> errors: No known data errors
> -----------------------------------------------------------------------
> % zpool iostat -v
>                              capacity     operations    bandwidth
> pool                        used  avail   read  write   read  write
> -------------------------  -----  -----  -----  -----  -----  -----
> store                      1.37T  2.72T     49    106   138K   543K
>  raidz1                   1.37T  2.72T     49    106   138K   543K
>    da0                        -      -     15     62  1017K  79.9K
>    da1                        -      -     15     62  1020K  80.3K
>    replacing                  -      -      0    103      0  88.3K
>      2025342973333799752      -      -      0      0  1.45K    261
>      da8                      -      -      0     79  1.45K  98.2K
>    da2                        -      -     14     62   948K  80.3K
>    da3                        -      -     13     62   894K  80.0K
>    da4                        -      -     14     63   942K  80.3K
>    da5                        -      -     15     62   992K  80.4K
>    da6                        -      -     15     62  1000K  80.1K
>    da7                        -      -     15     62  1022K  80.1K
> -------------------------  -----  -----  -----  -----  -----  -----

That definitely doesn't look right.   It should be showing the device name
there in the "replacing" section.

What's the exact "zpool replace" command that you used?

Freddie Cash
freebsd-stable@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to