[zfs-discuss] impressive

2007-02-01 Thread Dennis Clarke

boldly plowing forwards I request a few disks/vdevs to be mirrored
all at the same time :

bash-3.2# zpool status zfs0
  pool: zfs0
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Feb  1 04:17:58 2007
config:

NAME STATE READ WRITE CKSUM
zfs0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c0t10d0  ONLINE   0 0 0
  c1t11d0ONLINE   0 0 0
  c1t12d0ONLINE   0 0 0
  c1t13d0ONLINE   0 0 0
  c1t14d0ONLINE   0 0 0

errors: No known data errors
bash-3.2# zpool attach -f zfs0 c1t11d0 c0t11d0
bash-3.2# zpool attach -f zfs0 c1t12d0 c0t12d0
bash-3.2# zpool attach -f zfs0 c1t13d0 c0t13d0
bash-3.2# zpool attach -f zfs0 c1t14d0 c0t14d0

  needless to say there is some thrashing going on

bash-3.2# zpool status zfs0
  pool: zfs0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.00% done, 45h14m to go
config:

NAME STATE READ WRITE CKSUM
zfs0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c0t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c0t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c0t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c0t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0
c0t14d0  ONLINE   0 0 0

errors: No known data errors
bash-3.2#

moments later I see :

bash-3.2# zpool status zfs0
  pool: zfs0
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 1.59% done, 2h19m to go
config:

NAME STATE READ WRITE CKSUM
zfs0 ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t9d0   ONLINE   0 0 0
c0t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t10d0  ONLINE   0 0 0
c0t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t11d0  ONLINE   0 0 0
c0t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t12d0  ONLINE   0 0 0
c0t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t13d0  ONLINE   0 0 0
c0t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t14d0  ONLINE   0 0 0
c0t14d0  ONLINE   0 0 0

errors: No known data errors
bash-3.2#

bash-3.2# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci sd ip hook
neti sctp arp usba nca zfs random audiosup sppp crypto ptm md logindmux cpc
wrsmd fcip fctl fcp nfs ]
 ::memstat
Page SummaryPagesMB  %Tot
     
Kernel  79986   624   71%
Anon16131   126   14%
Exec and libs1830142%
Page cache533 40%
Free (cachelist)  934 71%
Free (freelist) 13662   106   12%

Total  113076   883
Physical   111514   871

bash-3.2#

so in a few hours I will have decent redundency

all on snv_55b ... looking very very fine

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS inode equivalent

2007-02-01 Thread Darren J Moffat

Neil Perrin wrote:

No it's not the final version or even the latest!
The current on disk format version is 3. However, it hasn't
diverged much and the znode/acl stuff hasn't changed.


and it will get updated as part of zfs-crypto, I just haven't done so 
yet because I'm not finished designing yet.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS checksums - block or file level

2007-02-01 Thread Nathan Essex
I am trying to understand if zfs checksums apply at a file or a block level.  
We know that zfs provides end to end checksum integrity, and I assumed that 
when I write a file to a zfs filesystem, the checksum was calculated at a file 
level, as opposed to say, a block level.  However, I have noticed that when I 
create an emulated volume, that volume has a checksum property, set to the same 
default as a normal zfs filesystem.  I can even change the checksum value as 
normal, see below:

# /usr/sbin/zfs create -V 50GB -b 128KB mypool/myvol

# /usr/sbin/zfs set checksum=sha256 mypool/myvol

Now on this emulated volume, I could place any number of structures that are 
not zfs filesystems, say raw database volumes, or ufs, qfs, etc.  Since these 
do not perform end to end checksums, can someone explain to me what the zfs 
checksum would be doing at this point?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: UFS on zvol: volblocksize and maxcontig

2007-02-01 Thread Richard L. Hamilton
I hope there will be consideration given to providing compatibility with UFS 
quotas
(except that inode limits would be ignored).  At least to the point of having

edquota(1m)
quot(1m)
quota(1m)
quotactl(7i)
repquota(1m)
rquotad(1m)

and possibly quotactl(7i) work with zfs (with the exception previously 
mentioned).
OTOH, quotaon(1m)/quotaoff(1m)/quotacheck(1m) may not be needed for support of
per-user quotas in zfs (since it will presumably have its own ways of enabling 
these, and
will simply never mess up?)

None of which need preclude new interfaces with greater functionality (like both
user and group quotas), but where there is similar functionality, IMO it would 
be
easier for a lot of folks if quota maintenance (esp. edquota and reporting) 
could
be done the same way for ufs and zfs.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS checksums - block or file level

2007-02-01 Thread Jeremy Teo

On 2/1/07, Nathan Essex [EMAIL PROTECTED] wrote:

I am trying to understand if zfs checksums apply at a file or a block level.  
We know that zfs provides end to end checksum integrity, and I assumed that 
when I write a file to a zfs filesystem, the checksum was calculated at a file 
level, as opposed to say, a block level.


ZFS checksums are done at the block level. End to end checksum
integrity means that when the actual data reaches the application from
the platter, we can guarantee to a very high certainty that the data
is uncorrupted. Either a block level checksum or a file level checksum
will suffice.
--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS checksums - block or file level

2007-02-01 Thread Neil Perrin

ZFS checksums are at the block level.

Nathan Essex wrote On 02/01/07 08:27,:

I am trying to understand if zfs checksums apply at a file or a block level.  
We know that zfs provides end to end checksum integrity, and I assumed that 
when I write a file to a zfs filesystem, the checksum was calculated at a file 
level, as opposed to say, a block level.  However, I have noticed that when I 
create an emulated volume, that volume has a checksum property, set to the same 
default as a normal zfs filesystem.  I can even change the checksum value as 
normal, see below:

# /usr/sbin/zfs create -V 50GB -b 128KB mypool/myvol

# /usr/sbin/zfs set checksum=sha256 mypool/myvol

Now on this emulated volume, I could place any number of structures that are 
not zfs filesystems, say raw database volumes, or ufs, qfs, etc.  Since these 
do not perform end to end checksums, can someone explain to me what the zfs 
checksum would be doing at this point?
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS checksums - block or file level

2007-02-01 Thread Darren J Moffat

Nathan Essex wrote:

Thank You, so that means that even if I use something that writes raw i/o to a 
zfs emulated volume, I still get the checksum protection, and hence data 
corruption protection.


yes it does.

Also consider how BAD performance could be if it were actually 
calculated on a per file basis.


For example a 1 bit write on to the end of a 5G file would require 
reading and checksuming 5G of data to calculate the new checksum.


Block level is the only sensible way to do this IMO.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS checksums - block or file level

2007-02-01 Thread Richard Elling

Neil Perrin wrote:

ZFS checksums are at the block level.


This has been causing some confusion lately, so perhaps we could say:
ZFS checksums are at the file system block level, not to be confused with
the disk block level or transport block level.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-02-01 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 That is the part of your setup that puzzled me.  You took the same 7 disk
 raid5 set and split them into 9 LUNS.  The Hitachi likely splits the virtual
 disk into 9 continuous partitions so each LUN maps back to different parts
 of the 7 disks.  I speculate that ZFS thinks it is talking to 9 different
 disks so spreads out the writes accordingly. What ZFS thinks is sequential
 writes becomes well spaced writes across the entire disk  blows your seek
 time off the roof. 

That's what I thought might happen before I even tried this, although it's
also possible the Hitachi stripes each LUN across all 7 disks.  Either
way, one could be getting too many seeks.  Note that I'm just trying to see
if it was so bad that the self-healing capability wasn't worth the cost.
I do realize these are 7200rpm SATA disks, so seeking isn't what they do best.


 I'm interested how it looks like from the Hitachi end.  If you can,
 repeat the test with the Hitachi presenting all 7 disks directly to
 ZFS as LUNs?

The array doesn't give us that capability.


 Interesting... what you are suggesting is that %b is 100% when w/s and r/s is
 0? 

Correct.  Sometimes all iostat -xn columns are 0 except %b;  Sometimes
the asvc_t column stays at 4.0 for the duration of the quiet period.
I've also observed times where all columns were 0, including %b.  Sure
is puzzling.


[EMAIL PROTECTED] said:
 IIRC, the calculation for %busy is the amount of time that an I/O is on the
 device.  These symptoms would occur if an I/O is dropped somewhere along the
 way or at the array.  Eventually, we'll timeout and retry, though by default
 that should be after 60 seconds.  I think we need to figure out what is going
 on here before accepting the results. It could be that we're overrunning the
 queue on the Hitachi.  By default, ZFS will send 35 concurrent commands per
 vdev and the ssd driver will send up to 256 to a target.  IIRC, Hitachi has a
 formula for calculating sdd_max_throttle to avoid such overruns, but I'm not
 sure if that applies to this specific array. 

Hmm, it's true that I have made no tuning changes on the T2000 side.  It
would make sense if the array just stopped responding.  I'll have to poke
at the array and see if it has any diagnostics logged somewhere.  I recall
that the Hitachi docs do have some recommendations on max-throttle settings,
so I'll go dig those up and see what I can find out.

Thanks for the comments,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-02-01 Thread Al Hopper
On Wed, 31 Jan 2007 [EMAIL PROTECTED] wrote:


 I understand all the math involved with RAID 5/6 and failure rates,
 but its wise to remember that even if the probabilities are small
 they aren't zero. :)

Agreed.  Another thing I've seen, is that if you have an A/C (Air
Conditioning) event in the data center or lab, you will usually see a
cluster of failures over the next 2 to 3 weeks.  Effectively, all your
disk drives have been thermally stressed and are likely to exhibit a spike
in the failure rates in the near term.

Often, in a larger environment, the facilities personnel don't understand
the co-relation between an A/C event and disk drive failure rates.  And
major A/C upgrade work is often scheduled over a (long) weekend when most
of the technical talent won't be present.  After the work is completed
everyone is told that it went very well because the organization does
not do bad news and then you loose two drives in a RAID5 array 

 And after 3-5 years of continuous operation, you better decommission the
 whole thing or you will have many disk failures.

Agreed.  We took an 11 disk FC hardware RAID box offline recently because
all the drives were 5 years old.  It's tough to hit those power off
switches and scrap working disk drives, but much better than the business
disruption and professional embarassment caused by data loss.  And much
better to be in control of, and experience, *scheduled* downtime.  BTW:
don't forget that if you plan to continue to use the disk enclosure
hardware you need to replace _all_ the fans first.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: What SATA controllers are people using for ZFS?

2007-02-01 Thread Al Hopper
On Thu, 1 Feb 2007, Tom Buskey wrote:

 [i]
 I got an Addonics eSata card. Sata 3.0. PCI *or* PCI-X. Works right off the 
 bat w/ 10u3. No firmware update needed. It was $130. But I don't pull out my 
 hair and I can use it if I upgrade my server for pci-x
 [/i]

 And I'm finding the throughput isn't there.   2MB/s in ZFS RAIDZ and worse 
 with UFS.
 *sigh*

I think that there are big issues with the 3124 driver.  I saw unexplained
pauses that lasted from 30 to 80+ Seconds during a tar from a single SATA
disk drive that I was migrating data from (using a Syba SD-SATA2-2E2I
card).  I fully expected the kernel to crash while observing this transfer
(it did'nt).  It happened periodically - each time a certain amount of
data had been transferred (just by observation - not measurement).  And
this was a UFS filesystem and the drive is a Sun original drive from an
Ultra 20 box.  I need to do some followup experiments as Mike Riley
(Sun) has kindly offered to take my results to the people working on this
driver.

 So, anyone know an inexpensive 4 port SATA card for PCI that'll work
 with 10u3 and I don't need to reflash the BIOS on?  (I bricked a
 Syba...)

Honestly, you're much better off with the $125 8-port SuperMicro board
that I have been unable to break to date. Details: SuperMicro
AOC-SAT2-MV8 8-port - uses the Rev C0 (Hercules-2) chip:

http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm

Kudos to the Sun developers working the Marvell driver!  :)  In the
meantime I hope to find time to test a SAS2041E-R (initially the PCI
Express version of this card).

Keep posting to zfs-discuss!  :)

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)

2007-02-01 Thread Chad Leigh -- Shire.Net LLC


On Feb 1, 2007, at 10:51 AM, Richard Elling wrote:


FYI,
here is an interesting blog on using ZFS with a dozen USB drives  
from Constantin.

http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb

My German is somewhat rusty, but I see that Google Translate does a  
respectable

job.  Thanks Constantin!
 -- richard


This is the best line:

 Hier ist die offizielle Dokumentation, echte Systemhelden jedoch  
kommen mit nur zwei man-Pages aus: zpool und zfs.


Roughly,  Here [link] is the official documentation; real system  
heroes need only the two manpages: zpool and zfs


Chad


---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net





smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS limits on zpool snapshots

2007-02-01 Thread Bill Moloney
The ZFS On-Disk specification and other ZFS documentation describe the labeling 
scheme used for the vdevs that comprise a ZFS pool.  A label entry contains, 
among other things, an array of uberblocks, one of which will point to the 
active object set of the pool it is a part of at a given instant (according to 
documentation, the active uberblock for a given pool could be located in the 
uberblock array of any vdev participating in the pool at a given instant, and 
is subject to relocation from vdev to vdev as the uberblock for the pool is 
recreated in an update).  Recreation of the active uberblock would occur, for 
example, if we took a snapshot of the pool and changes were then made anywhere 
in the pool.  Since a new uberblock is required in this snapshot scenario, and 
since it appears that the uberblocks are treated as a kind of circular list 
across vdevs, it seems to me that the number of available snapshots we could 
have of a pool at any given instant would be strictly limited to the number of 
available uberblocks in the vdevs of the pool (128 uberblocks per vdev, if I 
have that straight).  Is this truly the case or am I missing something here ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS limits on zpool snapshots

2007-02-01 Thread Wade . Stuart






[EMAIL PROTECTED] wrote on 02/01/2007 01:17:15 PM:

 The ZFS On-Disk specification and other ZFS documentation describe
 the labeling scheme used for the vdevs that comprise a ZFS pool.  A
 label entry contains, among other things, an array of uberblocks,
 one of which will point to the active object set of the pool it is a
 part of at a given instant (according to documentation, the active
 uberblock for a given pool could be located in the uberblock array
 of any vdev participating in the pool at a given instant, and is
 subject to relocation from vdev to vdev as the uberblock for the
 pool is recreated in an update).  Recreation of the active uberblock
 would occur, for example, if we took a snapshot of the pool and
 changes were then made anywhere in the pool.  Since a new uberblock
 is required in this snapshot scenario, and since it appears that the
 uberblocks are treated as a kind of circular list across vdevs, it
 seems to me that the number of available snapshots we could have of
 a pool at any given instant would be strictly limited to the number
 of available uberblocks in the vdevs of the pool (128 uberblocks per
 vdev, if I have that straight).  Is this truly the case or am I
 missing something here ?


It is my understanding that during the snapshot uberblock pool change
creation chain the old uberblock is treated as a separate entity and not
tied to the new uberblock list.

I am sure I will be corrected if I am reading the flow wrong.

Thanks,
-Wade




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS limits on zpool snapshots

2007-02-01 Thread Darren Dunham
 Recreation of the active uberblock would occur, for example, if we
 took a snapshot of the pool and changes were then made anywhere in the
 pool.

The uberblock is updated quite often, not just on snapshots.

 Since a new uberblock is required in this snapshot scenario,
 and since it appears that the uberblocks are treated as a kind of
 circular list across vdevs, it seems to me that the number of
 available snapshots we could have of a pool at any given instant would
 be strictly limited to the number of available uberblocks in the vdevs
 of the pool (128 uberblocks per vdev, if I have that straight).  Is
 this truly the case or am I missing something here ?

Are you talking about normal ZFS filesystem snapshots or something else?
The new uberblock will point to all filesystem snapshots.  The old
copies would never normally be referenced.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and thin provisioning

2007-02-01 Thread Andre Lue
I found this article (http://www.cuddletech.com/blog/pivot/entry.php?id=729) 
but I have 2 questions. I am trying the steps on Opensolaris build 54.

Since you create the filesystem with newfs, isn't that really a ufs filesystem 
running on top of zfs? Also I haven't been able to do anything in the normal 
fashion (ie zfs and zpool commands) with the thin provisioned created 
filesytem. Can't even mount it or online it.

Was this just a demo of things to come or is thin provision ready for testing 
in zfs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS limits on zpool snapshots

2007-02-01 Thread Andre Lue
As far as I know the recalled on paper number of snapshots you can have in a 
filesystem is 2^48.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and thin provisioning

2007-02-01 Thread Darren Dunham
 I found this article
 (http://www.cuddletech.com/blog/pivot/entry.php?id=729) but I have 2
 questions. I am trying the steps on Opensolaris build 54.

 Since you create the filesystem with newfs, isn't that really a ufs
 filesystem running on top of zfs?

In this case, yes.  I wonder if you could create a second zfs pool on
the volume.  (Starting such pools at boot time might be problematic
though!).  The idea is that you have sparse raw storage available to
you.  The example placed a UFS filesystem on it, but you could do
otherwise.

 Also I haven't been able to do
 anything in the normal fashion (ie zfs and zpool commands) with the
 thin provisioned created filesytem. Can't even mount it or online it.

No.  It's not a filesystem.  It's a zvol or raw volume of blocks.

 Was this just a demo of things to come or is thin provision ready for
 testing in zfs?

It's in there.  Did you create the volumes as it shows (with the -s)?


-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and thin provisioning

2007-02-01 Thread Darren Dunham
 In this case, yes.  I wonder if you could create a second zfs pool on
 the volume.  (Starting such pools at boot time might be problematic
 though!).  The idea is that you have sparse raw storage available to
 you.  The example placed a UFS filesystem on it, but you could do
 otherwise.

Followup to myself.

Okay, it's just wrong, but it appears to work  :-)

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  1.40M  8.24G  25.5K  /tank
tank/test 24.5K  8.24G  24.5K  /tank/test
tank/zvol11.28M  8.24G  1.28M  -
vtank   76K  1.91T  24.5K  /vtank

vtank is a zpool on top of tank/zvol1

NAMESTATE READ WRITE CKSUM
vtank   ONLINE   0 0 0
  /dev/zvol/dsk/tank/zvol1  ONLINE   0 0 0

# cp /usr/sbin/xntpdc /vtank
# zfs list vtank
NAME   USED  AVAIL  REFER  MOUNTPOINT
vtank  178K  1.91T   126K  /vtank

I don't think I want to reboot it in this state, though.  :-)

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: What SATA controllers are people using for ZFS?

2007-02-01 Thread Joe Little

On 2/1/07, Al Hopper [EMAIL PROTECTED] wrote:

On Thu, 1 Feb 2007, Tom Buskey wrote:

 [i]
 I got an Addonics eSata card. Sata 3.0. PCI *or* PCI-X. Works right off the 
bat w/ 10u3. No firmware update needed. It was $130. But I don't pull out my hair 
and I can use it if I upgrade my server for pci-x
 [/i]

 And I'm finding the throughput isn't there.   2MB/s in ZFS RAIDZ and worse 
with UFS.
 *sigh*

I think that there are big issues with the 3124 driver.  I saw unexplained
pauses that lasted from 30 to 80+ Seconds during a tar from a single SATA
disk drive that I was migrating data from (using a Syba SD-SATA2-2E2I
card).  I fully expected the kernel to crash while observing this transfer
(it did'nt).  It happened periodically - each time a certain amount of
data had been transferred (just by observation - not measurement).  And
this was a UFS filesystem and the drive is a Sun original drive from an
Ultra 20 box.  I need to do some followup experiments as Mike Riley
(Sun) has kindly offered to take my results to the people working on this
driver.

 So, anyone know an inexpensive 4 port SATA card for PCI that'll work
 with 10u3 and I don't need to reflash the BIOS on?  (I bricked a
 Syba...)

Honestly, you're much better off with the $125 8-port SuperMicro board
that I have been unable to break to date. Details: SuperMicro
AOC-SAT2-MV8 8-port - uses the Rev C0 (Hercules-2) chip:

http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm

Kudos to the Sun developers working the Marvell driver!  :)  In the
meantime I hope to find time to test a SAS2041E-R (initially the PCI
Express version of this card).



We switched away from those same Marvell cards because of unexplained
disconnects/reconnects that ZFS/Solaris would not survive from.
Stability for us came from embracing the Sil3124-2's (Tekram). We had
two marvell based systems, and the most stable are the now
discontinued SATA-I adaptec 16 port cards, and Sil3124s. I think its
redundant, but the state of SATA support here is still the most
glaring weakness. Isolating this all to a SCSI-to-SATA external
chassis is the surest route to bliss.


Keep posting to zfs-discuss!  :)

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vs NFS vs array caches, revisited

2007-02-01 Thread Marion Hakanson
I had followed with interest the turn off NV cache flushing thread, in
regard to doing ZFS-backed NFS on our low-end Hitachi array:

  http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg05000.html

In short, if you have non-volatile cache, you can configure the array
to ignore the ZFS cache-flush requests.  This is reported to improve the
really terrible performance of ZFS-backed NFS systems.  Feel free to
correct me if I'm misremembering

Anyway, I've also read that if ZFS notices it's using slices instead of
whole disks, it will not enable/use the write cache.  So I thought I'd be
clever and configure a ZFS pool on our array with a slice of a LUN instead
of the whole LUN, and fool ZFS into not issuing cache-flushes, rather
than having to change config of the array itself.

Unfortunately, it didn't make a bit of difference in my little NFS benchmark,
namely extracting a small 7.6MB tar file (C++ source code, 500 files/dirs).

I used three test zpools and a UFS filesystem (not all were in play at the
same time):
  pool: bulk_sp1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE 
CKSUM
bulk_sp1 ONLINE   0 0  
   0
  c6t4849544143484920443630303133323230303230d0  ONLINE   0 0  
   0

errors: No known data errors

  pool: bulk_sp1s
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ 
WRITE
CKSUM
bulk_sp1s  ONLINE   0 
0
0
  c6t4849544143484920443630303133323230303230d0s0  ONLINE   0 
0
0

errors: No known data errors

  pool: int01
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
int01 ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s5  ONLINE   0 0 0
c0t1d0s5  ONLINE   0 0 0

errors: No known data errors

# prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400 34 4294879232 4294879265
   1  400  4294879266 67517 4294946782
   8 1100  4294946783 16384 4294963166
# 

Both NFS client and server are Sun T2000's, 16GB RAM, switched gigabit
ethernet, Solaris-10U3 patched as of 12-Jan-2007, doing nothing else
at the time of the tests.

The bulk_sp1* pools were both on the same Hitachi 9520V RAID-5 SATA group
that I ran my bonnie++ tests on yesterday.  The int01 pool is mirrored
on two slice-5's of the server T2000's internal 2.5 SAS 73GB drives.

ZFS on whole-disk FC-SATA LUN via NFS:
real 968.13
user 0.33
sys 0.04
  7.9 KB/sec overall

ZFS on partial slice-0 of FC-SATA LUN via NFS:
real 950.77
user 0.33
sys 0.04
  8.0 KB/sec overall

ZFS on slice-5 mirror of internal SAS drives via NFS:
real 17.48
user 0.32
sys 0.03
  438.8 KB/sec overall

UFS on partial slice-0 of FC-SATA LUN via NFS:
real 6.13
user 0.32
sys 0.03
  1251.4 KB/sec overall


I'm not willing to disable the ZIL.  I think I'd settle for the 400KB/sec
range in this test from NFS on ZFS, if I could get that on our FC-SATA
Hitachi array.  As things are now, ZFS just won't work for us, and I'm
not sure how to make it go faster.

Thoughts  suggestions are welcome

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS checksums - block or file level

2007-02-01 Thread Victor Latushkin


Richard Elling wrote:

Neil Perrin wrote:

ZFS checksums are at the block level.


This has been causing some confusion lately, so perhaps we could say:
ZFS checksums are at the file system block level, not to be confused with
the disk block level or transport block level.


Saying that ZFS checksums are at the file system block level is also 
confusing since zvols have checksums too. May be it is better to say 
that ZFS checksums are at the zpool block level because zpool is the 
place where all blocks either from file system or zvol are stored.


Victor

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss