Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

2008-01-03 Thread Carsten Bormann
On Dec 29 2007, at 08:33, Jonathan Loran wrote:

 We snapshot the file as it exists at the time of
 the mv in the old file system until all referring file handles are
 closed, then destroy the single file snap.  I know, not easy to
 implement, but that is the correct behavior, I believe.

Exactly.

Note that apart from open descriptors, there may be other links to the  
file on the old FS; it has to be clear whether writes to the file in  
the new FS change the file in the old FS or not.  I'd rather say they  
shouldn't.
Yes, this would be different from the normal rename(2) semantics with  
respect to multiply linked files.  And yes, the semantics of link(2)  
should also be consistent with this.

Gruesse, Carsten

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

2008-01-03 Thread Joerg Schilling
Carsten Bormann [EMAIL PROTECTED] wrote:

 On Dec 29 2007, at 08:33, Jonathan Loran wrote:

  We snapshot the file as it exists at the time of
  the mv in the old file system until all referring file handles are
  closed, then destroy the single file snap.  I know, not easy to
  implement, but that is the correct behavior, I believe.

 Exactly.

 Note that apart from open descriptors, there may be other links to the  
 file on the old FS; it has to be clear whether writes to the file in  
 the new FS change the file in the old FS or not.  I'd rather say they  
 shouldn't.
 Yes, this would be different from the normal rename(2) semantics with  
 respect to multiply linked files.  And yes, the semantics of link(2)  
 should also be consistent with this.

This in an interesting problem. Your proposal would imply that a file
may have different identities in different filesystems:

-   different st_dev

-   different st_ino

-   different link count

This cannot be implemented with a single inode data anymore.

Well, it is not impossible as my WOFS (mentioned before) implements
hardlinks via inode relative symlinks. In order to allow this. a file
would need a storage pool global serial number that allows to match different
inode sets for the file.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spare and resilvering problem

2008-01-03 Thread Maciej Olchowik
Hi,

 Do you have snapshots taking place (like in a cron job) during the
 resilver process?  If so, you may be hitting a bug that the resilver
 will restart from the beginning whenever a new snapshot occurs.  If
 you disable the snapshots during the resilver then it should complete
 to 100%.

no, I don't have snapshots taking place. I found that when I query zfs 
pool with zpool status it restarts resilvering process, strange ...

Anyway, after ~10 days resilvering has finally completed to 100%
  resilver completed with 0 errors on Wed Jan  2 12:46:10 2008

The filesystem is still slow however. When I try to run zpool iostat it
takes few hours to produce output, same with zfs create.

I can't even post output of zpool status -v as it takes that long to complete.

We have 11 disks (+1 hot spare) in raidZ config, why is the filesystem so slow 
even now when hot spare has replaced faulty disk ?

thanks,

Maciej
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones

2008-01-03 Thread Steve McKinty
In general you should not allow a Solaris system to be both an NFS server and 
NFS client for the same filesystem, irrespective of whether zones are involved. 
Among other problems, you can run into kernel deadlocks in some (rare) 
circumstances. This is documented in the NFS administration docs. A loopback 
mount is definitely the recommended approach.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs panic on boot

2008-01-03 Thread Gordon Ross
I'm seeing this too.  Nothing unusual happened before the panic.
Just a shutdown (init 5) and later startup.  I have the crashdump
and copy of the problem zpool (on swan).  Here's the stack trace:

 $C
ff0004463680 vpanic()
ff00044636b0 vcmn_err+0x28(3, f792ecf0, ff0004463778)
ff00044637a0 zfs_panic_recover+0xb6()
ff0004463830 space_map_add+0xdb(ff014c1a21b8, 472785000, 1000)
ff00044638e0 space_map_load+0x1fc(ff014c1a21b8, fbd52568, 1, 
ff014c1a1e88, ff0149c88c30)
ff0004463920 metaslab_activate+0x66(ff014c1a1e80, 4000)
ff00044639e0 metaslab_group_alloc+0x24e(ff014bdeb000, 4000, 3a6734, 
1435b, ff014baa9840, 2)
ff0004463ab0 metaslab_alloc_dva+0x1da(ff01477880c0, ff014beefa70, 
4000, ff014baa9840, 2, 0, 3a6734, 0)
ff0004463b50 metaslab_alloc+0x82(ff01477880c0, ff014beefa70, 4000, 
ff014baa9840, 3, 3a6734, 0, 0)
ff0004463ba0 zio_dva_allocate+0x62(ff014934c458)
ff0004463bd0 zio_execute+0x7f(ff014934c458)
ff0004463c60 taskq_thread+0x1a7(ff014bfb77a0)
ff0004463c70 thread_start+8()

This is on a Ferrari laptop (AMD X64) running snv79.
I'd love to rescue my zpool.  Any suggestions?

Thanks,
Gordon
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bugid 6535160

2008-01-03 Thread Vincent Fox
We loaded Nevada_78 on a peer T2000 unit.  Imported the same ZFS pool.  I 
didn't even upgrade the pool since we wanted to be able to move it back to 
10u4.  Cut 'n paste of my colleague's email with the results:

Here's the latest Pepsi Challenge results.

Sol10u4 vs Nevada78. Same tuning options, same zpool, same storage, same SAN
switch - you get the idea. The only difference is the OS.

Sol10u4:
 4984: 82.878: Per-Operation Breakdown
closefile4404ops/s   0.0mb/s  0.0ms/op   19us/op-cpu
readfile4 404ops/s   6.3mb/s  0.1ms/op  109us/op-cpu
openfile4 404ops/s   0.0mb/s  0.1ms/op  112us/op-cpu
closefile3404ops/s   0.0mb/s  0.0ms/op   25us/op-cpu
fsyncfile3404ops/s   0.0mb/s 18.7ms/op 1168us/op-cpu
appendfilerand3   404ops/s   6.3mb/s  0.2ms/op  192us/op-cpu
readfile3 404ops/s   6.3mb/s  0.1ms/op  111us/op-cpu
openfile3 404ops/s   0.0mb/s  0.1ms/op  111us/op-cpu
closefile2404ops/s   0.0mb/s  0.0ms/op   24us/op-cpu
fsyncfile2404ops/s   0.0mb/s 19.0ms/op 1162us/op-cpu
appendfilerand2   404ops/s   6.3mb/s  0.2ms/op  173us/op-cpu
createfile2   404ops/s   0.0mb/s  0.3ms/op  334us/op-cpu
deletefile1   404ops/s   0.0mb/s  0.2ms/op  173us/op-cpu

 4984: 82.879: 
IO Summary:  318239 ops 5251.8 ops/s, (808/808 r/w)  25.2mb/s,   1228us
cpu/op,   9.7ms latency


Nevada78:
 1107: 82.554: Per-Operation Breakdown
closefile4   1223ops/s   0.0mb/s  0.0ms/op   22us/op-cpu
readfile41223ops/s  19.4mb/s  0.1ms/op  112us/op-cpu
openfile41223ops/s   0.0mb/s  0.1ms/op  128us/op-cpu
closefile3   1223ops/s   0.0mb/s  0.0ms/op   29us/op-cpu
fsyncfile3   1223ops/s   0.0mb/s  4.6ms/op  256us/op-cpu
appendfilerand3  1223ops/s  19.1mb/s  0.2ms/op  191us/op-cpu
readfile31223ops/s  19.9mb/s  0.1ms/op  116us/op-cpu
openfile31223ops/s   0.0mb/s  0.1ms/op  127us/op-cpu
closefile2   1223ops/s   0.0mb/s  0.0ms/op   28us/op-cpu
fsyncfile2   1223ops/s   0.0mb/s  4.4ms/op  239us/op-cpu
appendfilerand2  1223ops/s  19.1mb/s  0.1ms/op  159us/op-cpu
createfile2  1223ops/s   0.0mb/s  0.5ms/op  389us/op-cpu
deletefile1  1223ops/s   0.0mb/s  0.2ms/op  198us/op-cpu

 1107: 82.581: 
IO Summary:  954637 ops 15903.4 ops/s, (2447/2447 r/w)  77.5mb/s,
590us cpu/op,   2.6ms latency


That's a 3-4x improvement in ops/sec and average fsync time.


Here are the results from our UFS software mirror for comparison:
 4984: 211.056: Per-Operation Breakdown
closefile4465ops/s   0.0mb/s  0.0ms/op   23us/op-cpu
readfile4 465ops/s  12.6mb/s  0.1ms/op  142us/op-cpu
openfile4 465ops/s   0.0mb/s  0.1ms/op   83us/op-cpu
closefile3465ops/s   0.0mb/s  0.0ms/op   24us/op-cpu
fsyncfile3465ops/s   0.0mb/s  6.0ms/op  498us/op-cpu
appendfilerand3   465ops/s   7.3mb/s  1.7ms/op  282us/op-cpu
readfile3 465ops/s  11.1mb/s  0.1ms/op  132us/op-cpu
openfile3 465ops/s   0.0mb/s  0.1ms/op   84us/op-cpu
closefile2465ops/s   0.0mb/s  0.0ms/op   26us/op-cpu
fsyncfile2465ops/s   0.0mb/s  5.9ms/op  445us/op-cpu
appendfilerand2   465ops/s   7.3mb/s  1.1ms/op  231us/op-cpu
createfile2   465ops/s   0.0mb/s  2.2ms/op  443us/op-cpu
deletefile1   465ops/s   0.0mb/s  2.0ms/op  269us/op-cpu

 4984: 211.057: 
IO Summary:  366557 ops 6049.2 ops/s, (931/931 r/w)  38.2mb/s,912us
cpu/op,   4.8ms latency


So either we're hitting a pretty serious zfs bug, or they're purposely
holding back performance in Solaris 10 so that we all have a good reason to
upgrade to 11.  ;) 
 

-Nick
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hello,

There seems to be a persistent issue we have with ZFS where one of the
SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
does not offline the disk and instead hangs all zpools across the
system. If it is not caught soon enough, application data ends up in
an inconsistent state. We've had this issue with b54 through b77 (as
of last night).

We don't seem to be the only folks with this issue reading through the
archives. Are there any plans to fix this behavior? It really makes
ZFS less than desirable/reliable.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Albert Chin
On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
 There seems to be a persistent issue we have with ZFS where one of the
 SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
 does not offline the disk and instead hangs all zpools across the
 system. If it is not caught soon enough, application data ends up in
 an inconsistent state. We've had this issue with b54 through b77 (as
 of last night).
 
 We don't seem to be the only folks with this issue reading through the
 archives. Are there any plans to fix this behavior? It really makes
 ZFS less than desirable/reliable.

http://blogs.sun.com/eschrock/entry/zfs_and_fma

FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
  http://www.opensolaris.org/os/community/arc/caselog/2007/283/
  http://www.opensolaris.org/os/community/on/flag-days/all/

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Albert,

Thank you for the link. ZFS isn't offlining the disk in b77.

-J

On Jan 3, 2008 3:07 PM, Albert Chin
[EMAIL PROTECTED] wrote:

 On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
  There seems to be a persistent issue we have with ZFS where one of the
  SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
  does not offline the disk and instead hangs all zpools across the
  system. If it is not caught soon enough, application data ends up in
  an inconsistent state. We've had this issue with b54 through b77 (as
  of last night).
 
  We don't seem to be the only folks with this issue reading through the
  archives. Are there any plans to fix this behavior? It really makes
  ZFS less than desirable/reliable.

 http://blogs.sun.com/eschrock/entry/zfs_and_fma

 FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
   http://www.opensolaris.org/os/community/arc/caselog/2007/283/
   http://www.opensolaris.org/os/community/on/flag-days/all/

 --
 albert chin ([EMAIL PROTECTED])
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Eric Schrock
This should be pretty much fixed on build 77.  It will lock up for the
duration of a single command timeout, but ZFS should recover quickly
without queueing up additional commands.  Since the default timeout is
60 seconds, and we retry 3 times, and we do a probe afterwards, you may
see hangs of up to 6 minutes.  Unfortunately there's not much we can do,
since that's the minimum amount of time to do two I/O operations to a
single drive (one that fails and one to do a basic probe of the disk).
You can tune down 'sd_io_time' to a more reasonable value to get shorter
command timeouts, but this may break slow things (like powered down
CD-ROM drives).

Other options at the ZFS level could be imagined, but would require
per-pool tunables:

1. Allowing I/O to complete as soon as it was on enough devices, instead
   of replicating to all devices.

2. Inventing a per-pool tunable that controlled timeouts independent
   of SCSI timeouts.

Neither of these is trivial, and both potentially compromise data
integrity, hence the lack of such features.  There's no easy solution to
the problem, but we're happy to hear ideas.

- Eric

On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
 Hello,
 
 There seems to be a persistent issue we have with ZFS where one of the
 SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
 does not offline the disk and instead hangs all zpools across the
 system. If it is not caught soon enough, application data ends up in
 an inconsistent state. We've had this issue with b54 through b77 (as
 of last night).
 
 We don't seem to be the only folks with this issue reading through the
 archives. Are there any plans to fix this behavior? It really makes
 ZFS less than desirable/reliable.
 
 Best Regards,
 Jason
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Eric Schrock
When you say starts throwing sense errors, does that mean every I/O to
the drive will fail, or some arbitrary percentage of I/Os will fail?  If
it's the latter, ZFS is trying to do the right thing by recognizing
these as transient errors, but eventually the ZFS diagnosis should kick
in.  What does '::spa -ve' in 'mdb -k' show in one of these situations?
How about '::zio_state'?

- Eric

On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote:
 Hi Albert,
 
 Thank you for the link. ZFS isn't offlining the disk in b77.
 
 -J
 
 On Jan 3, 2008 3:07 PM, Albert Chin
 [EMAIL PROTECTED] wrote:
 
  On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
   There seems to be a persistent issue we have with ZFS where one of the
   SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
   does not offline the disk and instead hangs all zpools across the
   system. If it is not caught soon enough, application data ends up in
   an inconsistent state. We've had this issue with b54 through b77 (as
   of last night).
  
   We don't seem to be the only folks with this issue reading through the
   archives. Are there any plans to fix this behavior? It really makes
   ZFS less than desirable/reliable.
 
  http://blogs.sun.com/eschrock/entry/zfs_and_fma
 
  FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
http://www.opensolaris.org/os/community/arc/caselog/2007/283/
http://www.opensolaris.org/os/community/on/flag-days/all/
 
  --
  albert chin ([EMAIL PROTECTED])
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] rename(2) (mv(1)) between ZFS filesystems in the same zpool

2008-01-03 Thread Jonathan Loran



Joerg Schilling wrote:

Carsten Bormann [EMAIL PROTECTED] wrote:

  

On Dec 29 2007, at 08:33, Jonathan Loran wrote:



We snapshot the file as it exists at the time of
the mv in the old file system until all referring file handles are
closed, then destroy the single file snap.  I know, not easy to
implement, but that is the correct behavior, I believe.
  

Exactly.

Note that apart from open descriptors, there may be other links to the  
file on the old FS; it has to be clear whether writes to the file in  
the new FS change the file in the old FS or not.  I'd rather say they  
shouldn't.
Yes, this would be different from the normal rename(2) semantics with  
respect to multiply linked files.  And yes, the semantics of link(2)  
should also be consistent with this.



This in an interesting problem. Your proposal would imply that a file
may have different identities in different filesystems:

-   different st_dev

-   different st_ino

-   different link count

This cannot be implemented with a single inode data anymore.

Well, it is not impossible as my WOFS (mentioned before) implements
hardlinks via inode relative symlinks. In order to allow this. a file
would need a storage pool global serial number that allows to match different
inode sets for the file.

Jörg

  


At first, as I mentioned in my earlier email, I was thinking we needed 
to emulate the cross-fs rename/link/etc behavior as it is currently 
implemented, where a file appears to actually be copied.  But now I'm 
not so sure. 

In Unixland, the ideal has always been to have the whole file system, 
kit and caboodle, singly rooted at /.  Heck, even devices are in the 
file system.  Of course, reality required that Programmatically, we 
needed to be aware of what file system your cwd is in.  At a minimum, 
it's returned in our various stat structs (st_dev). 

I can see I'm getting long winded, but I'm thinking: what is the value 
of having different behavior with a cross zfs file move, within the same 
pool as that  between  directories.  I'm not addressing the previous 
discussion about how to treat file handles, etc, but more about sharing 
open file blocks, linked across zfs boundaries before and after such a mv. 

I think the test is this: can we find a scenario where something would 
break if we did share the file blocks across zfs boundaries after such a 
mv?  For every example I've been able to think of, if I ask the 
question: what if I moved the file from one directory to the other, 
instead of across zfs boundaries, would it have been different? it's 
been no.  Comments please. 


Jon

--


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs panic on boot

2008-01-03 Thread Rob Logan
  space_map_add+0xdb(ff014c1a21b8, 472785000, 1000)
  space_map_load+0x1fc(ff014c1a21b8, fbd52568, 1,  
ff014c1a1e88, ff0149c88c30)
  running snv79.

hmm.. did you spend any time in snv_74 or snv_75 that might
have gotten http://bugs.opensolaris.org/view_bug.do?bug_id=6603147

zdb -e name_of_pool_that_crashes_on_import
would be interesting, but the damage might have been done.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] ZFS shared /home between zones

2008-01-03 Thread Ian Collins
James C. McPherson wrote:

 The ws command hates it - hmm, the underlying device for
 /scratch is /scratch maybe if I loop around stat()ing
 it it'll turn into a pumpkin

 :-)


   
As does dmake, which is a real PITA for a developer!

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-help] ZFS woes

2008-01-03 Thread Ian Collins
Scott L. Burson wrote:
 Hi,

 This is in build 74, on x64, on a Tyan S2882-D with dual Opteron 275 and 24GB 
 of ECC DRAM.

   
Not an answer, but zfs-discuss is probably the best place to ask, so
I've taken the liberty of CCing that list.

 I seem to have lost the entire contents of a ZFS raidz pool.  The pool is in 
 a state where, if ZFS looks at it, I get a kernel panic.  To make it possible 
 to boot the machine, I had to boot into safe mode and rename 
 `/etc/zfs/zpool.cache' (fortunately, this was my only pool on the machine).

 Okay, from the beginning.  I bought the drives in October: three 500GB 
 Western Digital WD5000ABYS SATA drives, installed them in the box in place of 
 three 250GB Seagates I had been using, and created the raidz pool.  For the 
 first couple of months everything was hunky dory.  Then, a couple of weeks 
 ago, I moved the machine to a different location in the building, which 
 wouldn't even be worth mentioning except that that's when I started to have 
 problems.  The first time I powered it up, one of the SATA drives didn't show 
 up; I reseated the drive connectors and tried again, and it seemed fine.  I 
 thought that was odd, since I hadn't had one of those connectors come loose 
 on me before, but I scrubbed the pool, cleared the errors on the drive, and 
 thought that was the end of it.

 It wasn't.  `zpool status' continued to report errors, only now they were 
 write and read errors, and spread across all three drives.  I started to copy 
 the most critical parts of the filesystem contents onto other machines (very 
 fortunately, as it turned out).  After a while, the drive that had previously 
 not shown up was marked faulted, and the other two were marked degraded.  
 Then, yesterday, there was a much larger number of errors -- over 3000 read 
 errors -- on a different drive, and that drive was marked faulted and the 
 other two (i.e. including the one that had previously been faulted) were 
 marked degraded.  Also, `zpool status' told me I had lost some files; these 
 turned out to be all, or mostly, directories, some containing substantial 
 trees.

 By this point I had already concluded I was going to have to replace a drive, 
 and had picked up a replacement.  I installed it in place of the drive that 
 was now marked faulted, and powered up.  I was met with repeated panics and 
 reboots.  I managed to copy down part of the backtrace:

   unix:die+c8
   unix:trap+1351
   unix:cmntrap+e9
   unix:mutex_enter+b
   zfs:metaslab_free+97
   zfs:zio_dva_free+29
   zfs:zio_next_stage+b3
   zfs:zio_gang_pipeline+??

 (This may contain typos, and I didn't get the offset on that last frame.)

 At this point I tried replacing the drive I had just removed (removing the 
 new, blank drive), but that didn't help.  So, as mentioned above, I tried 
 booting into safe mode and renaming `/etc/zfs/zpool.cache' -- just on a 
 hunch, but I figured there had to be some such way to make ZFS forget about 
 the pool -- and that allowed me to boot.

 I used good old `format' to run read tests on the drives overnight -- no bad 
 blocks were detected.

 So, there are a couple lines of discussion here.  On the one hand, it seems I 
 have a hardware problem, but I haven't yet diagnosed it.  More on this below. 
  On the other, even in the face of hardware problems, I have to report some 
 disappointment with ZFS.  I had really been enjoying the warm fuzzy feeling 
 ZFS gave me (and I was talking it up to my colleagues; I'm the only one here 
 using it).  Now I'm in a worse state than I would probably be with UFS on 
 RAID, where `fsck' would probably have managed to salvage a lot of the 
 filesystem (I would certainly be able to mount it! -- unless the drives were 
 all failing catastrophically, which doesn't seem to be happening).

 One could say, there are two aspects to filesystem robustness: integrity 
 checking and recovery.  ZFS, with its block checksums, gets an A in integrity 
 checking, but now appears to do very poorly in recovering in the face of 
 substantial but not total hardware degradation, when that degradation is 
 sufficiently severe that the redundancy of the pool can't correct for it.

 Perhaps this is a vanishingly rare case and I am just very unlucky.  
 Nonetheless I would like to make some suggestions.  (1) It would still be 
 nice to have a salvager.  (2) I think it would make sense, at least as an 
 option, to add even more redundancy to ZFS's on-disk layout; for instance, it 
 could keep copies of all directories.

 Okay, back to my hardware problems.  I know you're going to tell me I 
 probably have a bad power supply, and I can't rule that out, but it's an 
 expensive PSU and generously sized for the box; and the box had been rock 
 stable for a good 18 months before this happened.  I'm naturally more 
 inclined to suspect the new components, which are the SATA drives.  (I also 
 have three SCSI drives in the box for /, swap, etc., and they don't seem to 
 be 

Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Eric,

Hard to say. I'll use MDB next time it happens for more info. The
applications using any zpool lock up.

-J

On Jan 3, 2008 3:33 PM, Eric Schrock [EMAIL PROTECTED] wrote:
 When you say starts throwing sense errors, does that mean every I/O to
 the drive will fail, or some arbitrary percentage of I/Os will fail?  If
 it's the latter, ZFS is trying to do the right thing by recognizing
 these as transient errors, but eventually the ZFS diagnosis should kick
 in.  What does '::spa -ve' in 'mdb -k' show in one of these situations?
 How about '::zio_state'?

 - Eric


 On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote:
  Hi Albert,
 
  Thank you for the link. ZFS isn't offlining the disk in b77.
 
  -J
 
  On Jan 3, 2008 3:07 PM, Albert Chin
  [EMAIL PROTECTED] wrote:
  
   On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
There seems to be a persistent issue we have with ZFS where one of the
SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
does not offline the disk and instead hangs all zpools across the
system. If it is not caught soon enough, application data ends up in
an inconsistent state. We've had this issue with b54 through b77 (as
of last night).
   
We don't seem to be the only folks with this issue reading through the
archives. Are there any plans to fix this behavior? It really makes
ZFS less than desirable/reliable.
  
   http://blogs.sun.com/eschrock/entry/zfs_and_fma
  
   FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
 http://www.opensolaris.org/os/community/arc/caselog/2007/283/
 http://www.opensolaris.org/os/community/on/flag-days/all/
  
   --
   albert chin ([EMAIL PROTECTED])
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss