[zfs-discuss] Re: user undo
Anything that attempts to append characters on the end of the filename will run into trouble when the file name is already at NAME_MAX. One simple solution is to restrict the total length of the name to NAME_MAX, truncating the original filename as necessary to allow appending. This does introduce the possibility of conflicts with very long names which happen to end in numeric strings, but that is likely to be rare and could be resolved in an ad hoc fashion (e.g. flipping a bit in the representation of inode number until a unique name is achieved). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: How's zfs RAIDZ fualt-tolerant ???
raidz is like raid 5, so you can survive the death of one disk, not 2. I would recomend you configure the 12 disks into, 2 raidz groups, then you can survive the death of one drive from each group. This is what i did on my system Hi James , Thank you very much. ;-) I'll configure 2 raidz groups in my pool tomorrow . BTW, I'm not sure that multiple raidz groups might sacrifice performance? Thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How's zfs RAIDZ fualt-tolerant ???
RAID-Z is single-fault tolerant. If if you take out two disks, then you no longer have the required redundancy to maintain your data. Build 42 should contain double-parity RAID-Z, which will allow you to sustain two simulataneous disk failures without dataloss. I'm not sure if this has been mentioned elsewhere (I didn't see it..) but will this double parity be backported into Solaris 10 in time for making the U2 release? This is a sorely needed piece of functionality for my deployment (and I'm sure many others.) Thanks, David ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS mirror and read policy; kstat I/O values for zfs
Hi, after some testing with ZFS I noticed that read requests are not scheduled even to the drives but the first one gets predominately selected: My pool is setup as follows: NAMESTATE READ WRITE CKSUM tpc ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 Disk I/O after doing some benchmarking: capacity operationsbandwidth pool used avail read write read write -- - - - - - - tpc 7.70G 50.9G 85 21 10.5M 1.08M mirror1.10G 7.28G 11 3 1.47M 159K c1t0d0 - - 10 2 1.34M 159K c4t0d0 - - 1 2 138K 159K mirror1.10G 7.27G 11 3 1.48M 159K c1t1d0 - - 10 2 1.34M 159K c4t1d0 - - 1 2 140K 159K mirror1.09G 7.28G 12 3 1.50M 159K c1t2d0 - - 10 2 1.37M 159K c4t2d0 - - 0 2 128K 159K mirror1.10G 7.28G 12 3 1.53M 158K c1t3d0 - - 11 2 1.42M 158K c4t3d0 - - 0 2 110K 158K mirror1.10G 7.28G 11 3 1.44M 158K c1t4d0 - - 10 2 1.33M 158K c4t4d0 - - 0 2 112K 158K mirror1.10G 7.28G 12 3 1.53M 158K c1t6d0 - - 11 2 1.42M 158K c4t6d0 - - 0 2 106K 158K mirror1.11G 7.26G 12 3 1.55M 158K c1t7d0 - - 11 2 1.42M 158K c4t7d0 - - 1 2 130K 158K -- - - - - - - or with iostat 11.44.3 1451.1 157.1 0.0 0.30.4 19.6 0 17 c1t7d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c4t5d0 10.74.3 1361.4 158.4 0.0 0.30.4 22.1 0 18 c1t0d0 10.94.3 1395.7 157.9 0.0 0.30.4 18.6 0 16 c1t2d0 1.04.3 129.0 157.1 0.0 0.00.88.9 0 2 c4t7d0 0.94.3 112.0 156.9 0.0 0.00.99.4 0 2 c4t4d0 1.14.4 139.5 158.3 0.0 0.00.98.8 0 3 c4t1d0 10.64.3 1354.8 157.0 0.0 0.30.4 18.8 0 16 c1t4d0 0.94.3 109.2 157.3 0.0 0.10.99.7 0 3 c4t3d0 10.74.4 1363.4 158.3 0.0 0.30.4 21.9 0 18 c1t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c4t8d0 1.04.3 127.0 157.8 0.0 0.00.99.0 0 2 c4t2d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c1t8d0 11.44.3 1449.9 156.9 0.0 0.30.4 20.0 0 17 c1t6d0 0.84.3 105.4 156.8 0.0 0.00.98.5 0 2 c4t6d0 11.34.3 1447.4 157.4 0.0 0.30.4 18.9 0 17 c1t3d0 1.14.4 137.7 158.4 0.0 0.00.98.8 0 2 c4t0d0 So you can see the second disk of each mirror pair (c4tXd0) gets almost no I/O. How does ZFS decide from which mirror device to read? And just another notice: SVM does offer kstat values of type KSTAT_TYPE_IO. Why not ZFS (at least on zpool level)? And BTW (not ZFS related, but SVM): With the introduction of the SVM bunnahabhain project (friendly names) iostat -n output is now completely useless - even if you still use the old naming scheme: % iostat -n extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.02.3 0 0 c0d0 0.00.00.00.0 0.0 0.00.02.4 0 0 c0d1 0.05.00.7 21.8 0.0 0.00.01.5 0 1 c3d0 0.04.10.6 20.9 0.0 0.00.02.8 0 1 c4d0 1.6 37.3 16.6 164.3 0.1 0.12.51.6 1 5 c2d0 1.6 37.5 16.5 164.5 0.1 0.13.21.7 1 5 c1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 fd0 2.91.9 19.34.8 0.0 0.20.3 37.2 0 1 md5 0.00.00.00.0
Re: [zfs-discuss] How's zfs RAIDZ fualt-tolerant ???
On Fri, May 26, 2006 at 10:33:34AM -0700, Eric Schrock wrote: RAID-Z is single-fault tolerant. If if you take out two disks, then you no longer have the required redundancy to maintain your data. Build 42 should contain double-parity RAID-Z, which will allow you to sustain two simulataneous disk failures without dataloss. Eric, is raidz double parity optional or mandatory? grant. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive write cache
Gregory Shaw wrote: In recent Linux distributions, when the kernel shuts down, the kernel will force the scsi drives to flush their write cache. I don't know if solaris does the same but I think not, due to the ongoing focus of solaris and disabling write cache. The Solaris sd(7D) SCSI disk driver issues a SYNCHRONIZE CACHE command upon the last close of the device. Rgds, Ed -- Edmund Nadolski Sun Microsystems Inc. [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How's zfs RAIDZ fualt-tolerant ???
On Sat, May 27, 2006 at 08:29:05AM +1000, grant beattie wrote: is raidz double parity optional or mandatory? Backwards compatibility dictates that it will be optional. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ata panic
`mv`ing files from a zfs dir to another zfs filesystem in the same pool will panic a 8 sata zraid http://supermicro.com/Aplus/motherboard/Opteron/nForce/H8DCE.cfm system with ::status debugging crash dump vmcore.3 (64-bit) from zfs operating system: 5.11 opensol-20060523 (i86pc) panic message: assertion failed: !(status 0x80), file: ../../intel/io/dktp/controller/ata/ata _disk.c, line: 2212 dump content: kernel pages only ::stack vpanic() assfail+0x83(f3afb508, f3afb4d8, 8a4) ata_disk_intr_pio_out+0x1dd(8f51b840, 84ff5440, 911a8d50) ata_ctlr_fsm+0x237(2, 8f51b840, 0, 0, 0) ata_process_intr+0x3e(8f51b840, fe8b3be4) ghd_intr+0x72(8f51b958, fe8b3be4) ata_intr+0x25(8f51b840) av_dispatch_autovect+0x97(2d) intr_thread+0x50() every time... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive write cache
Gregory Shaw wrote: I had a question to the group: In the different ZFS discussions in zfs-discuss, I've seen a recurring theme of disabling write cache on disks. I would think that the performance increase of using write cache would be an advantage, and that write cache should be enabled. Realistically, I can see only one situation where write cache would be an issue. If there is no way to flush the write cache, it would be possible for corruption to occur due to a power loss. There are two failure modes associated with disk write caches: 1) the disk write cache for performance reasons doesn't write back data (to diff. blocks) to the platter in the order they were received, so transactional ordering isn't maintained and corruption can occur. 2) writes to different can disks have different caching policies, so transactions to files on different filesystems may not complete correctly during a power failure. ZFS enables the write cache and flushes it when committing transaction groups; this insures that all of a transaction group appears or does not appear on disk. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS mirror and read policy; kstat I/O values for zfs
On Fri, May 26, 2006 at 09:40:57PM +0200, Daniel Rock wrote: So you can see the second disk of each mirror pair (c4tXd0) gets almost no I/O. How does ZFS decide from which mirror device to read? You are almost certainly running in to this known bug: 630 reads from mirror are not spread evenly --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive write cache
On 5/26/06, Bart Smaalders [EMAIL PROTECTED] wrote: There are two failure modes associated with disk write caches: Failure modes aside, is there any benefit to a write cache when command queueing is available? It seems that the primary advantage is in allowing old ATA hardware to issue writes in an asynchronous manner. Beyond that, it doesn't really make much sense, if the queue is deep enough. ZFS enables the write cache and flushes it when committing transaction groups; this insures that all of a transaction group appears or does not appear on disk. How often is the write cache flushed, and is it synchronous? Unless I am misunderstanding something, wouldn't it be better to use ordered tags, and avoid cache flushes all together? Also, does ZFS disable the disk read cache? It seems that this would be counterproductive with ZFS. Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive write cache
ZFS enables the write cache and flushes it when committing transaction groups; this insures that all of a transaction group appears or does not appear on disk. It also flushes the disk write cache before returning from every synchronous request (eg fsync, O_DSYNC). This is done after writing out the intent log blocks. Neil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss