We have a number of shared spares configured in our ZFS pools, and
we're seeing weird issues where spares don't get used under some
circumstances. We're running Solaris 10 U6 using pools made up of
mirrored vdevs, and what I've seen is:
* if ZFS detects enough checksum errors on an active disk,
We have a situation where all of the spares in a set of pools have
gone into a faulted state and now, apparently, we can't remove them
or otherwise de-fault them. I'm confidant that the underlying disks
are fine, but ZFS seems quite unwilling to do anything with the spares
situation.
(The
Suppose that you have a SAN environment with a lot of LUNs. In the
normal course of events this means that 'zpool import' is very slow,
because it has to probe all of the LUNs all of the time.
In S10U6, the theoretical 'obvious' way to get around this for your
SAN filesystems seems to be to use
| The errant command which accidentally adds a vdev could just as easily
| be a command which scrambles up or erases all of the data.
The difference between a mistaken command that accidentally adds a vdev
and the other ways to loose your data with ZFS is that the 'add a vdev'
accident is only
| As others have noted, the COW nature of ZFS means that there is a good
| chance that on a mostly-empty pool, previous data is still intact long
| after you might think it is gone.
In the cases I am thinking of I am sure that the data was there.
Kernel panics just didn't let me get at it.
I'm not Anton Rang, but:
| How would you describe the difference between the data recovery
| utility and ZFS's normal data recovery process?
The data recovery utility should not panic my entire system if it runs
into some situation that it utterly cannot handle. Solaris 10 U5 kernel
ZFS code
| Syslog is funny in that it does a lot of open/write/close cycles so
| that rotate can work trivially.
I don't know of any version of syslog that does this (certainly Solaris
10 U5 syslog does not). The traditional syslog(d) performance issue
is that it fsync()'s after writing each log message,
For various sorts of manageability reasons[*], I need to be able to
extract information about the vdev and device structure of our ZFS pools
(partly because we're using iSCSI and MPXIO, which create basically
opaque device names). Unfortunately Solaris 10 U5 doesn't seem to
currently provide any
| I guess I find it ridiculous you're complaining about ram when I can
| purchase 4gb for under 50 dollars on a desktop.
|
| Its not like were talking about a 500 dollar purchase.
'On a desktop' is an important qualification here. Server RAM is
more expensive, and then you get to multiply it by
| Every time I've come across a usage scenario where the submitter asks
| for per user quotas, its usually a university type scenario where
| univeristies are notorious for providing lots of CPU horsepower (many,
| many servers) attached to a simply dismal amount of back-end storage.
Speaking as
| The ZFS filesystem approach is actually better than quotas for User
| and Shared directories, since the purpose is to limit the amount of
| space taken up *under that directory tree*.
Speaking only for myself, I would find ZFS filesystems somewhat more
useful if they were more like directory
I have a test Solaris machine with 8 GB of memory. When freshly booted,
the ARC consumes 5 GB (and I would be happy to make it consume more)
and file-level prefetching works great even when I hit the machine with
a lot of simultaneous sequential reads. But overnight, the ARC has
shrunk to 2 GB
As part of testing for our planned iSCSI + ZFS NFS server environment,
I wanted to see what would happen if I imported a ZFS pool on two
machines at once (as might happen someday in, for example, a failover
scenario gone horribly wrong).
What I expected was something between a pool with damage
Is there any way to set ZFS on a system so that it will not
automatically import all of the ZFS pools it had active when it was last
running?
The problem with automatic importation is preventing disasters in a
failover situation. Assume that you have a SAN environment with the same
disks
| On Nevada, use the 'cachefile' property. On S10 releases, use '-R /'
| when creating/importing the pool.
The drawback of '-R /' appears to be that it requires forcing the
import after a system reboot *all* the time (unless you explicitly
export the pool during reboot).
- cks
| My impression is that the only real problem with incrementals from
| ufsdump or star is that you would like to have a database that tells
| you in which incremental a specific file with a specific time stamp
| may be found.
In our situation here, this is done by the overall backup system
| I very strongly disagree. The closest ZFS equivalent to ufsdump is
| 'zfs send'. 'zfs send' like ufsdump has initmiate awareness of the
| the actual on disk layout and is an integrated part of the filesystem
| implementation.
I must strongly disagree in turn, at least for Solaris 10. 'zfs
| Primarily cost, reliability (less complex hw = less hw that can
| fail), and serviceability (no need to rebuy the exact same raid card
| model when it fails, any SATA controller will do).
|
| As long as the RAID is self-contained on the card, and the disks are
| exported as JBOD, then you
[Eric Schrock:]
| Look at alternate cachefiles ('zpool set cachefile', 'zpool import -c
| cachefile', etc). This avoids scanning all devices in the system
| and instead takes the config from the cachefile.
This sounds great.
Is there any information on when this change will make it to Solaris?
| So, from a feature perspective it looks like S10U6 is going to be in
| pretty good shape ZFS-wise. If only someone could speak to (perhaps
| under the cloak of anonymity ;) ) the timing side :).
For what it's worth, back in January or so we were told that S10U6 was
scheduled for August. Given
We're planning to build a ZFS-based Solaris NFS fileserver environment
with the backend storage being iSCSI-based, in part because of the
possibilities for failover. In exploring things in our test environment,
I have noticed that 'zpool import' takes a fairly long time; about
35 to 45 seconds
| Have you tried to disable vdev caching and leave file level
| prefetching?
If you mean setting zfs_vdev_cache_bshift to 13 (per the ZFS Evil
Tuning Guide) to turn off device-level prefetching then yes, I have
tried turning off just that; it made no difference.
If there's another tunable then
I wrote:
| I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am
| seeing a weird performance degradation as the number of simultaneous
| sequential reads increases.
To update zfs-discuss on this: after more investigation, this seems
to be due to file-level prefetching. Turning
I have a ZFS-based NFS server (Solaris 10 U4 on x86) where I am seeing
a weird performance degradation as the number of simultaneous sequential
reads increases.
Setup:
NFS client - Solaris NFS server - iSCSI target machine
There are 12 physical disks on the iSCSI target machine. Each
[Jeff Bonwick:]
| That said, I suspect I know the reason for the particular problem
| you're seeing: we currently do a bit too much vdev-level caching.
| Each vdev can have up to 10MB of cache. With 132 pools, even if
| each pool is just a single iSCSI device, that's 1.32GB of cache.
|
| We need
| I think the root cause of the issue is that multiple groups are buying
| physical rather than virtual storage yet it is all being attached to a
| single system.
They're actually buying constant-sized chunks of virtual storage, which
is provided through a pool of SAN-based disk space. This
| There are two issues here. One is the number of pools, but the other
| is the small amount of RAM in the server. To be honest, most laptops
| today come with 2 GBytes, and most servers are in the 8-16 GByte range
| (hmmm... I suppose I could look up the average size we sell...)
Speaking as a
I have a test system with 132 (small) ZFS pools[*], as part of our
work to validate a new ZFS-based fileserver environment. In testing,
it appears that we can produce situations that will run the kernel out
of memory, or at least out of some resource such that things start
complaining 'bash:
| Still, I'm curious -- why lots of pools? Administration would be
| simpler with a single pool containing many filesystems.
The short answer is that it is politically and administratively easier
to use (at least) one pool per storage-buying group in our environment.
This got discussed in more
| Hi Chris, I would have thought that managing multiple pools (you
| mentioned 200) would be an absolute administrative nightmare. If you
| give more details about your storage needs like number of users, space
| required etc it might become clearer what you're thinking of setting
| up.
Every
| I don't think that's the case. What's wrong with setting both a quota
| and a reservation on your user filesystems?
In a shared ZFS pool situation I don't think we'd get anything from
using both. We have to use something to limit people to the storage that
they bought, and in at least S10 U4
| Is it really true that as the guy on the above link states (Please
| read the link, sorry) when one iSCSI mirror goes off line, the
| initiator system will panic? Or even worse, not boot its self cleanly
| after such a panic? How could this be? Anyone else with experience
| with iSCSI based
In our environment, the politically and administratively simplest
approach to managing our storage is to give each separate group at
least one ZFS pool of their own (into which they will put their various
filesystems). This could lead to a proliferation of ZFS pools on our
fileservers (my current
| You DO mean IPMP then. That's what I was trying to sort out, to make
| sure th at you were talking about the IP part of things, the iSCSI
| layer.
My apologies for my lack of clarity. We are not looking at IPMP
multipathing; we are using MPxIO multipathing (mpathadm et al), which
operates at
We're currently designing a ZFS fileserver environment with iSCSI based
storage (for failover, cost, ease of expansion, and so on). As part of
this we would like to use multipathing for extra reliability, and I am
not sure how we want to configure it.
Our iSCSI backend only supports multiple
| I assume you mean IPMP here, which refers to ethernet multipath.
|
| There is also the other meaning of multipath referring to multiple
| paths to the storage array typically enabled by stmsboot command.
We are currently looking at (and testing) the non-ethernet sort of
multipathing, partly as
36 matches
Mail list logo