[zfs-discuss] Re: user undo

2006-05-26 Thread Anton B. Rang
Anything that attempts to append characters on the end of the filename 
will run into trouble when the file name is already at NAME_MAX.

One simple solution is to restrict the total length of the name to NAME_MAX, 
truncating the original filename as necessary to allow appending.  This does 
introduce the possibility of conflicts with very long names which happen to end 
in numeric strings, but that is likely to be rare and could be resolved in an 
ad hoc fashion (e.g. flipping a bit in the representation of inode number 
until a unique name is achieved).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: How's zfs RAIDZ fualt-tolerant ???

2006-05-26 Thread axa
raidz is like raid 5, so you can survive the death of one disk, not 2.
I would recomend you configure the 12 disks into, 2 raidz groups,
then you can survive the death of one drive from each group. This is
what i did on my system

Hi James , Thank you very much. ;-)

I'll configure 2 raidz groups in my pool tomorrow . BTW, I'm not sure that 
multiple raidz groups might sacrifice performance?


Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How's zfs RAIDZ fualt-tolerant ???

2006-05-26 Thread David J. Orman
 RAID-Z is single-fault tolerant.  If if you take out two disks, 
 then you
 no longer have the required redundancy to maintain your data.  
 Build 42
 should contain double-parity RAID-Z, which will allow you to 
 sustain two
 simulataneous disk failures without dataloss.

I'm not sure if this has been mentioned elsewhere (I didn't see it..) but will 
this double parity be backported into Solaris 10 in time for making the U2 
release? This is a sorely needed piece of functionality for my deployment (and 
I'm sure many others.)

Thanks,
David
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS mirror and read policy; kstat I/O values for zfs

2006-05-26 Thread Daniel Rock

Hi,

after some testing with ZFS I noticed that read requests are not scheduled 
even to the drives but the first one gets predominately selected:



My pool is setup as follows:

NAMESTATE READ WRITE CKSUM
tpc ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c4t7d0  ONLINE   0 0 0


Disk I/O after doing some benchmarking:

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tpc 7.70G  50.9G 85 21  10.5M  1.08M
  mirror1.10G  7.28G 11  3  1.47M   159K
c1t0d0  -  - 10  2  1.34M   159K
c4t0d0  -  -  1  2   138K   159K
  mirror1.10G  7.27G 11  3  1.48M   159K
c1t1d0  -  - 10  2  1.34M   159K
c4t1d0  -  -  1  2   140K   159K
  mirror1.09G  7.28G 12  3  1.50M   159K
c1t2d0  -  - 10  2  1.37M   159K
c4t2d0  -  -  0  2   128K   159K
  mirror1.10G  7.28G 12  3  1.53M   158K
c1t3d0  -  - 11  2  1.42M   158K
c4t3d0  -  -  0  2   110K   158K
  mirror1.10G  7.28G 11  3  1.44M   158K
c1t4d0  -  - 10  2  1.33M   158K
c4t4d0  -  -  0  2   112K   158K
  mirror1.10G  7.28G 12  3  1.53M   158K
c1t6d0  -  - 11  2  1.42M   158K
c4t6d0  -  -  0  2   106K   158K
  mirror1.11G  7.26G 12  3  1.55M   158K
c1t7d0  -  - 11  2  1.42M   158K
c4t7d0  -  -  1  2   130K   158K
--  -  -  -  -  -  -


or with iostat
   11.44.3 1451.1  157.1  0.0  0.30.4   19.6   0  17 c1t7d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c4t5d0
   10.74.3 1361.4  158.4  0.0  0.30.4   22.1   0  18 c1t0d0
   10.94.3 1395.7  157.9  0.0  0.30.4   18.6   0  16 c1t2d0
1.04.3  129.0  157.1  0.0  0.00.88.9   0   2 c4t7d0
0.94.3  112.0  156.9  0.0  0.00.99.4   0   2 c4t4d0
1.14.4  139.5  158.3  0.0  0.00.98.8   0   3 c4t1d0
   10.64.3 1354.8  157.0  0.0  0.30.4   18.8   0  16 c1t4d0
0.94.3  109.2  157.3  0.0  0.10.99.7   0   3 c4t3d0
   10.74.4 1363.4  158.3  0.0  0.30.4   21.9   0  18 c1t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c4t8d0
1.04.3  127.0  157.8  0.0  0.00.99.0   0   2 c4t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c1t8d0
   11.44.3 1449.9  156.9  0.0  0.30.4   20.0   0  17 c1t6d0
0.84.3  105.4  156.8  0.0  0.00.98.5   0   2 c4t6d0
   11.34.3 1447.4  157.4  0.0  0.30.4   18.9   0  17 c1t3d0
1.14.4  137.7  158.4  0.0  0.00.98.8   0   2 c4t0d0



So you can see the second disk of each mirror pair (c4tXd0) gets almost no 
I/O. How does ZFS decide from which mirror device to read?



And just another notice:
SVM does offer kstat values of type KSTAT_TYPE_IO. Why not ZFS (at least on 
zpool level)?


And BTW (not ZFS related, but SVM):
With the introduction of the SVM bunnahabhain project (friendly names) 
iostat -n output is now completely useless - even if you still use the old 
naming scheme:


% iostat -n
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.02.3   0   0 c0d0
0.00.00.00.0  0.0  0.00.02.4   0   0 c0d1
0.05.00.7   21.8  0.0  0.00.01.5   0   1 c3d0
0.04.10.6   20.9  0.0  0.00.02.8   0   1 c4d0
1.6   37.3   16.6  164.3  0.1  0.12.51.6   1   5 c2d0
1.6   37.5   16.5  164.5  0.1  0.13.21.7   1   5 c1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 fd0
2.91.9   19.34.8  0.0  0.20.3   37.2   0   1 md5
0.00.00.00.0  

Re: [zfs-discuss] How's zfs RAIDZ fualt-tolerant ???

2006-05-26 Thread grant beattie
On Fri, May 26, 2006 at 10:33:34AM -0700, Eric Schrock wrote:

 RAID-Z is single-fault tolerant.  If if you take out two disks, then you
 no longer have the required redundancy to maintain your data.  Build 42
 should contain double-parity RAID-Z, which will allow you to sustain two
 simulataneous disk failures without dataloss.

Eric,

is raidz double parity optional or mandatory?

grant.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard drive write cache

2006-05-26 Thread Ed Nadolski

Gregory Shaw wrote:

In recent Linux distributions, when the kernel shuts down, the  
kernel will force the scsi drives to flush their write cache.   I  don't 
know if solaris does the same but I think not, due to the  ongoing focus 
of solaris and disabling write cache.


The Solaris sd(7D) SCSI disk driver issues a SYNCHRONIZE CACHE command upon the 
last close of the device.


Rgds,
Ed

--
Edmund Nadolski
Sun Microsystems Inc.
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How's zfs RAIDZ fualt-tolerant ???

2006-05-26 Thread Nicolas Williams
On Sat, May 27, 2006 at 08:29:05AM +1000, grant beattie wrote:
 is raidz double parity optional or mandatory?

Backwards compatibility dictates that it will be optional.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ata panic

2006-05-26 Thread Rob Logan


`mv`ing files from a zfs dir to another zfs filesystem
in the same pool will panic a 8 sata zraid
http://supermicro.com/Aplus/motherboard/Opteron/nForce/H8DCE.cfm
system with

::status
debugging crash dump vmcore.3 (64-bit) from zfs
operating system: 5.11 opensol-20060523 (i86pc)
panic message:
assertion failed: !(status  0x80), file: ../../intel/io/dktp/controller/ata/ata
_disk.c, line: 2212
dump content: kernel pages only

::stack
vpanic()
assfail+0x83(f3afb508, f3afb4d8, 8a4)
ata_disk_intr_pio_out+0x1dd(8f51b840, 84ff5440, 
911a8d50)
ata_ctlr_fsm+0x237(2, 8f51b840, 0, 0, 0)
ata_process_intr+0x3e(8f51b840, fe8b3be4)
ghd_intr+0x72(8f51b958, fe8b3be4)
ata_intr+0x25(8f51b840)
av_dispatch_autovect+0x97(2d)
intr_thread+0x50()

every time...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard drive write cache

2006-05-26 Thread Bart Smaalders

Gregory Shaw wrote:

I had a question to the group:
In the different ZFS discussions in zfs-discuss, I've seen a 
recurring theme of disabling write cache on disks. I would think that 
the performance increase of using write cache would be an advantage, and 
that write cache should be enabled.
Realistically, I can see only one situation where write cache would 
be an issue.  If there is no way to flush the write cache, it would be 
possible for corruption to occur due to a power loss.


There are two failure modes associated with disk write caches:

1) the disk write cache for performance reasons doesn't write back
   data (to diff. blocks) to the platter in the order they were
   received, so transactional ordering isn't maintained and
   corruption can occur.

2) writes to different can disks have different caching policies, so
   transactions to files on different filesystems may not complete
   correctly during a power failure.

ZFS enables the write cache and flushes it when committing transaction
groups; this insures that all of a transaction group appears or does
not appear on disk.

- Bart




--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS mirror and read policy; kstat I/O values for zfs

2006-05-26 Thread Matthew Ahrens
On Fri, May 26, 2006 at 09:40:57PM +0200, Daniel Rock wrote:
 So you can see the second disk of each mirror pair (c4tXd0) gets almost no 
 I/O. How does ZFS decide from which mirror device to read?

You are almost certainly running in to this known bug:

630 reads from mirror are not spread evenly

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard drive write cache

2006-05-26 Thread Chris Csanady

On 5/26/06, Bart Smaalders [EMAIL PROTECTED] wrote:


There are two failure modes associated with disk write caches:


Failure modes aside, is there any benefit to a write cache when command
queueing is available?  It seems that the primary advantage is in allowing
old ATA hardware to issue writes in an asynchronous manner.  Beyond
that, it doesn't really make much sense, if the queue is deep enough.


ZFS enables the write cache and flushes it when committing transaction
groups; this insures that all of a transaction group appears or does
not appear on disk.


How often is the write cache flushed, and is it synchronous?  Unless I am
misunderstanding something, wouldn't it be better to use ordered tags, and
avoid cache flushes all together?

Also, does ZFS disable the disk read cache?  It seems that this would be
counterproductive with ZFS.

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hard drive write cache

2006-05-26 Thread Neil Perrin



ZFS enables the write cache and flushes it when committing transaction
groups; this insures that all of a transaction group appears or does
not appear on disk.


It also flushes the disk write cache before returning from every
synchronous request (eg fsync, O_DSYNC). This is done after
writing out the intent log blocks.

Neil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss