Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-14 Thread Paul Armstrong
Paul Kraus wrote: In the ZFS case I could replace the disk and the zpool would resilver automatically. I could also take the removed disk and put it into the second system and have it recognize the zpool (and that it was missing half of a mirror) and the data was all there.

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Ralf Ramge
Gino wrote: [...] Just a few examples: -We lost several zpool with S10U3 because of spacemap bug, and -nothing- was recoverable. No fsck here :( Yes, I criticized the lack of zpool recovery mechanisms, too, during my AVS testing. But I don't have the know-how to judge if it has

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
On Tue, 2007-09-11 at 13:43 -0700, Gino wrote: -ZFS+FC JBOD: failed hard disk need a reboot :( (frankly unbelievable in 2007!) So, I've been using ZFS with some creaky old FC JBODs (A5200's) and old disks which have been failing regularly and haven't seen that; the worst I've

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
Yes, this is a case where the disk has not completely failed. ZFS seems to handle the completely failed disk case properly, and has for a long time. Cutting the power (which you can also do with luxadm) makes the disk appear completely failed. Richard, I think you're right. The failed

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
We have seen just the opposite... we have a server with about 0 million files and only 4 TB of data. We have been benchmarking FSes for creation and manipulation of large populations of small files and ZFS is the only one we have found that continues to scale linearly above one million

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
-We had tons of kernel panics because of ZFS. Here a reboot must be planned with a couple of weeks in advance and done only at saturday night .. Well, I'm sorry, but if your datacenter runs into problems when a single server isn't available, you probably have much worse problems.

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Ralf Ramge
Gino wrote: The real problem is that ZFS should stop to force kernel panics. I found these panics very annoying, too. And even more that the zpool was faulted afterwards. But my problem is that when someone asks me what ZFS should do instead, I have no idea. I have large Sybase database

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
Gino wrote: The real problem is that ZFS should stop to force kernel panics. I found these panics very annoying, too. And even more that the zpool was faulted afterwards. But my problem is that when someone asks me what ZFS should do instead, I have no idea. well, what about just

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 09/12/2007 08:04:33 AM: Gino wrote: The real problem is that ZFS should stop to force kernel panics. I found these panics very annoying, too. And even more that the zpool was faulted afterwards. But my problem is that when someone asks me what ZFS

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Gino
It seems that maybe there is too large a code path leading to panics -- maybe a side effect of ZFS being new (compared to other filesystems). I would hope that as these panic issues are coming up that the code path leading to the panic is evaluated for a specific fix or behavior code

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-12 Thread Marion Hakanson
. . . Use JBODs. Or tell the cache controllers to ignore the flushing requests. [EMAIL PROTECTED] said: Unfortunately HP EVA can't do it. About the 9900V, it is really fast (64GB cache helps a lot) end reliable. 100% uptime in years. We'll never touch it to solve a ZFS problem. On our

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-11 Thread Gino
To put this in perspective, no system on the planet today handles all faults. I would even argue that building such a system is theoretically impossible. no doubt about that ;) So the subset of faults which ZFS covers which is different than the subset that UFS covers and different than

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-11 Thread Bill Sommerfeld
On Tue, 2007-09-11 at 13:43 -0700, Gino wrote: -ZFS+FC JBOD: failed hard disk need a reboot :( (frankly unbelievable in 2007!) So, I've been using ZFS with some creaky old FC JBODs (A5200's) and old disks which have been failing regularly and haven't seen that; the worst I've seen

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-11 Thread Richard Elling
Bill Sommerfeld wrote: On Tue, 2007-09-11 at 13:43 -0700, Gino wrote: -ZFS+FC JBOD: failed hard disk need a reboot :( (frankly unbelievable in 2007!) So, I've been using ZFS with some creaky old FC JBODs (A5200's) and old disks which have been failing regularly and haven't seen

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-11 Thread Paul Kraus
On 9/11/07, Gino [EMAIL PROTECTED] wrote: -ZFS performs badly with a lot of small files. (about 20 times slower that UFS with our millions file rsync procedures) We have seen just the opposite... we have a server with about 40 million files and only 4 TB of data. We have been

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-10 Thread Richard Elling
Gino wrote: Richard, thank you for your detailed reply. Unfortunately an other reason to stay with UFS in production .. IMHO, maturity is the primary reason to stick with UFS. To look at this through the maturity lens, UFS is the great grandfather living on life support (prune juice

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-09 Thread Gino
Richard, thank you for your detailed reply. Unfortunately an other reason to stay with UFS in production .. IMHO, maturity is the primary reason to stick with UFS. To look at this through the maturity lens, UFS is the great grandfather living on life support (prune juice and

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-08 Thread Richard Elling
Gino wrote: cfgadm -al or devfsadm -C didn't solve the problem. After a reboot ZFS recognized the drive as failed and all worked well. Do we need to restart Solaris after a drive failure?? It depends... ... on which version of Solaris you are

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-05 Thread Paul Kraus
On 9/4/07, Gino [EMAIL PROTECTED] wrote: yesterday we had a drive failure on a fc-al jbod with 14 drives. Suddenly the zpool using that jbod stopped to respond to I/O requests and we get tons of the following messages on /var/adm/messages: snip cfgadm -al or devfsadm -C didn't solve the

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-05 Thread Richard Elling
Paul Kraus wrote: On 9/4/07, Gino [EMAIL PROTECTED] wrote: yesterday we had a drive failure on a fc-al jbod with 14 drives. Suddenly the zpool using that jbod stopped to respond to I/O requests and we get tons of the following messages on /var/adm/messages: snip cfgadm -al or devfsadm

[zfs-discuss] I/O freeze after a disk failure

2007-09-04 Thread Gino
Hi all, yesterday we had a drive failure on a fc-al jbod with 14 drives. Suddenly the zpool using that jbod stopped to respond to I/O requests and we get tons of the following messages on /var/adm/messages: Sep 3 15:20:10 fb2 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-04 Thread Mark Ashley
I'm going to go out on a limb here and say you have an A5000 with the 1.6 disks in it. Because of their design, (all drives seeing each other on both the A and B loops), it's possible for one disk that is behaving badly to take over the FC-AL loop and require human intervention. You can

Re: [zfs-discuss] I/O freeze after a disk failure

2007-09-04 Thread Gino
Hi Mark, the drive (147GB, FC 2Gb) failed on a Xyratex JBOD. Also in the past we had the same problem with a drive failed on a EMC CX JBOD. Anyway I can't understand why rebooting Solaris solved out the situation .. Thank you, Gino This message posted from opensolaris.org