[zfs-discuss] Proposal: ZFS hotplug support and autoconfiguration

2007-03-21 Thread Eric Schrock
Folks -

I'm preparing to submit the attached PSARC case to provide better
support for device removal and insertion within ZFS.  Since this is a
rather complex issue, with a fair share of corner issues, I thought I'd
send the proposal out to the ZFS community at large for further comment
before submitting it.

The prototype is functional except for the offline device insertion and
hot spares functionality.  I hope to have this integrated within the
next month, along with the next phase of FMA integration.  Please
respond with any comments, concerns, or suggestions.

Thanks,

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
1. INTRODUCTION

Currently, ZFS supports what is affectionately known as poor man's
hotplug.  If a device is removed from the system, then it is assumed
that upon I/O failure, an attempt to reopen the same device will fail.
This will trigger a FMA fault, substituting a hot spare if available.
This is undesirable for two reasons:

- There is no distinction between device removal and arbitrary failure.
  If a device is removed from the system, it should be treated as a
  deliberate action different from normal failure.

- There is no support for automatic response to device insertion.  For a
  server configured with a ZFS pool, the administrator should be able to
  walk up, remove any drive (preferably a faulted one), insert a new
  drive, and not have to issue any ZFS commands to reconfigure the pool.
  This is particularly true for the appliance space, where hardware
  reconfiguration should just work.

This case enhances ZFS to respond to device removal and provides a
mechanism to automatically deal with device insertion.  While the
framework is generic, the primary target is devices supported by
the SATA framework.  The only device-specific portion of this proposal
concerns determining if a device is in the same physical location as a
previously known device, involve correlating a transport's enumeration
of the device with the device's physical location within the chassis.


2. DEVICE REMOVAL

There are two types of device removal within Solaris.  Coordinated
device removal involves stopping all consumers of the device, using the
appropriate cfgadm(1M) command (PSARC 1996/285), and then physically
removing the device.  Uncoordinated removal (also known as surprise
removal) is when a device is physically removed while still in active
use by the system.  The latter increasingly common as more I/O protocols
support hotplug and higher level software (ZFS) becomes more capable.

There are several ways to detect device removal within Solaris.  Fibre
channel drivers generate the NDI events FCAL_INSERT_EVENT and
FCAL_REMOVE_EVENT.  USB and 1394 drivers generate the NDI events
DDI_DEVI_INSERT_EVENT and DDI_DEVI_REMOVE_EVENT.  In addition to these
event channels, there is also the DKIOCSTATE ioctl() which returns (on
capable drivers) DKIO_DEV_GONE if the device has been removed.

Of these, the ioctl() is the most widely supported, and is the mechanism
used as part of this case.  Since this is an implementation detail of
the current architecture, it does not preclude using alternate
mechanisms in the future.  When an I/O to a disk fails, ZFS will query
the media state by the DKIOCSTATE ioctl.  If the device is any state
other than DKIO_INSERTED, ZFS will transition the device to a new
REMOVED state.  No FMA fault will be triggered, and a hot spare (if any)
will be substituted if available.  Note that the DKIO_DEV_GONE can be
returned for a variety of reasons (pulling cables, external chassis
being powered off, etc).  In the absence of additional FMA information,
it is assumed that this is intentional administrative action.

As part of this work, lofiadm(1M) will be expanded to include a new
force (-f) flag when removing devices.  Combined with the upcoming lofi
devfs events (PSARC 2006/709), this will provide a much simpler testing
framework without the need for physical hardware interaction.  When this
flag is used, the underlying file will be closed, any further I/O or
attempts to open the device will fail, and DKIOCSTATE will return
DKIO_DEV_GONE.  This flag will remain private for testing only, and will
not be documented.

An example of this in action:

# lofiadm -a /disk/a
/dev/lofi/1
# lofiadm -a /disk/b
/dev/lofi/2
# lofiadm -a /disk/c
/dev/lofi/3
# zpool create -f test mirror /dev/lofi/1 /dev/lofi/2 spare /dev/lofi/3
# while :; do touch /test/foo; sync; sleep 1; done 
[1] 100662
# zpool status
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE   0 0 0
  mirror ONLINE   0 0 0
/dev/lofi/1  ONLINE   0 0 0
/dev/lofi/2  ONLINE   0 0 0
spares
  /dev/lofi/3AVAIL

errors: No known data errors
# lofiadm -d /disk/a -f
# zpool status
  pool: test
 state: DEGRADED

Re: [zfs-discuss] Proposal: ZFS hotplug support and autoconfiguration

2007-03-21 Thread Eric Schrock
On Thu, Mar 22, 2007 at 01:03:48AM +0100, Robert Milkowski wrote:
 
   What if I have a failing drive (still works but I want it to be
   replaced) and I have a replacement drive on a shelf. All I want is
   to remove failing drive, insert new one and resilver. I do not want
   a hot spare to automatically kick in.
 

Kicking in a hot spare is a harmless activity (the end result will be
the same), why would you want to avoid this?  Do you have an idea of how
you would want to control this behavior?

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Proposal: ZFS hotplug support and autoconfiguration

2007-03-21 Thread Robert Milkowski
Hello Eric,

Thursday, March 22, 2007, 1:13:19 AM, you wrote:

ES On Thu, Mar 22, 2007 at 01:03:48AM +0100, Robert Milkowski wrote:
 
   What if I have a failing drive (still works but I want it to be
   replaced) and I have a replacement drive on a shelf. All I want is
   to remove failing drive, insert new one and resilver. I do not want
   a hot spare to automatically kick in.
 

ES Kicking in a hot spare is a harmless activity (the end result will be
ES the same), why would you want to avoid this?

With lot of storage I like to keep as much the same config as I can
between the same boxes. So if I have a replacement drive I would
rather use it instead of hot spare so I do not have to resilver again.
I know this is mostly esthetics but helps in managing storage.

ES Do you have an idea of how
ES you would want to control this behavior?

Maybe simple method of freezing hot spares (not by removing them)
or maybe the automated method should have some reasonable delay -
when it sees disk is gone it will wait N seconds before hot spare
kicks in or if new drive is present at the same physical location then
it would use it rather then a hot spare (or perhaps admin can issue
zpool replace manually before hot spare kicks in).

I'm not sure if it won't complicate things too much but still I like
to keep similar configs.

Or maybe an ability to stop resilvering of hot spare and start
resilver of a new drive would be sufficient or it should even be
automatic (stop resilvering of hot spare, but still keep all data
already resilvered, start resilver new disk with data which has not
yet been resilvered on hot spare, then resilver data which was
resilverd as hot spare, then release a hot spare). All would be
working only if some kind of hotspare-back property would be set.

It's a matter of what people prefer - a moving hot spare or rather if
disk is replaced the hot spare goes back to a hot spare list and is
released (after disk reilvered). It probably doesn't matter that much
on x4500 but it can matter more on other arrays.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss