Re: [zfs-discuss] ZFS on Ubuntu

2010-06-28 Thread Joe Little
All true, I just saw too many need ubuntu and zfs and thought to state the 
obvious in case the patch set for nexenta happen to differ enough to provide a 
working set. I've had nexenta succeed where opensolaris quarter releases failed 
and vice versa

On Jun 27, 2010, at 9:54 PM, Erik Trimble erik.trim...@oracle.com wrote:

 On 6/27/2010 9:07 PM, Richard Elling wrote:
 On Jun 27, 2010, at 8:52 PM, Erik Trimble wrote:
 
   
 But that won't solve the OP's problem, which was that OpenSolaris doesn't 
 support his hardware. Nexenta has the same hardware limitations as 
 OpenSolaris.
 
 AFAICT, the OP's problem is with a keyboard.  The vagaries of keyboards
 is well documented, but there is no silver bullet. Indeed, I have one box 
 that
 seems to be more or less happy with PS-2 vs USB for every other OS or
 hypervisor. My advice, have one of each handy, just in case.
  -- richard
 
   
 
 Right. I was just pointing out the fallacy of thinking that Nexenta might 
 work on hardware that OpenSolaris doesn't (or has problems with).
 
 
 
 -- 
 Erik Trimble
 Java System Support
 Mailstop:  usca22-123
 Phone:  x17195
 Santa Clara, CA
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Ubuntu

2010-06-27 Thread Joe Little
Of course, nexenta os is a build of ubuntu on an opensolaris kernel.



On Jun 26, 2010, at 12:27 AM, Freddie Cash fjwc...@gmail.com wrote:

 On Sat, Jun 26, 2010 at 12:20 AM, Ben Miles merloc...@hotmail.com wrote:
 What supporting applications are there on Ubuntu for RAIDZ?
 
 None.  Ubuntu doesn't officially support ZFS.
 
 You can kind of make it work using the ZFS-FUSE project.  But it's not
 stable, nor recommended.
 
 -- 
 Freddie Cash
 fjwc...@gmail.com
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely bad performance - hw failure?

2009-12-27 Thread Joe Little
I've had this happen to me too. I found some dtrace scripts at the
time that showed that the file system was spending too much time
finding available 128k blocks or the like as I was near full per each
disk, even though combined I still had 140GB left of my 3TB pool. The
SPA code I believe it was was spending too much time walking the
available pool for continguous space for new writes, and this
affecting both read and write performance dramatically (measured in
kb/sec).

I was able to alleviate the pressure so to speak by adjusting the
recordsize for the pool down to 8k (32k is likely more recommended)
and from there I could then start to clear out space. Anything below
10% available space seems to cause ZFS to start behaving poorly, and
getting down lower increases the problems. But the root cause was
metadata management on pools w/ less than 5-10% disk space left.

In my case, I had lots of symlinks, lots of small files, and also
dozens of snapshots. My pool was a RAID10 (aka, 3 mirror sets
striped).


On Sun, Dec 27, 2009 at 4:52 PM, Morten-Christian Bernson m...@uib.no wrote:
 Lately my zfs pool in my home server has degraded to a state where it can be 
 said it doesn't work at all.  Read spead is slower than I can read from the 
 internet on my slow dsl-line... This is compared to just a short while ago, 
 where I could read from it with over 50mb/sec over the network.

 My setup:
 Running latest Solaris 10: # uname -a
 SunOS solssd01 5.10 Generic_142901-02 i86pc i386 i86pc

 # zpool status DATA
  pool: DATA
  state: ONLINE
 config:
        NAME        STATE     READ WRITE CKSUM
        DATA        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
        spares
          c0t2d0    AVAIL
 errors: No known data errors

 # zfs list -r DATA
 NAME                               USED  AVAIL  REFER  MOUNTPOINT
 DATA                              3,78T   229G  3,78T  /DATA

 All of the drives in this pool are 1.5tb western digital green drives. I am 
 not seeing any error messages in /var/adm/messages, and fmdump -eV shows no 
 errors...   However, I am seeing some soft faults in iostat -eEn:
   errors ---
  s/w h/w trn tot device
  2   0   0   2 c0t0d0
  1   0   0   1 c1t0d0
  2   0   0   2 c2t1d0
 151   0   0 151 c2t2d0
 151   0   0 151 c2t3d0
 153   0   0 153 c2t4d0
 153   0   0 153 c2t5d0
  2   0   0   2 c0t1d0
  3   0   0   3 c0t2d0
  0   0   0   0 solssd01:vold(pid531)
 c0t0d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
 Vendor: Sun      Product: STK RAID INT     Revision: V1.0 Serial No:
 Size: 31.87GB 31866224128 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 2 Predictive Failure Analysis: 0
 c1t0d0           Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
 Vendor: _NEC     Product: DVD_RW ND-3500AG Revision: 2.16 Serial No:
 Size: 0.00GB 0 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 1 Predictive Failure Analysis: 0
 c2t1d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:
 Size: 750.16GB 750156373504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 2 Predictive Failure Analysis: 0
 c2t2d0           Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 151 Predictive Failure Analysis: 0
 c2t3d0           Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 151 Predictive Failure Analysis: 0
 c2t4d0           Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 153 Predictive Failure Analysis: 0
 c2t5d0           Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 153 Predictive Failure Analysis: 0
 c0t1d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
 Vendor: Sun      Product: STK RAID INT     Revision: V1.0 Serial No:
 Size: 31.87GB 31866224128 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 2 Predictive Failure Analysis: 0
 c0t2d0           Soft Errors: 3 Hard Errors: 0 

Re: [zfs-discuss] SATA controller suggestion

2008-06-06 Thread Joe Little
On Thu, Jun 5, 2008 at 9:26 PM, Tim [EMAIL PROTECTED] wrote:


 On Thu, Jun 5, 2008 at 11:12 PM, Joe Little [EMAIL PROTECTED] wrote:

 On Thu, Jun 5, 2008 at 8:16 PM, Tim [EMAIL PROTECTED] wrote:
 
 
  On Thu, Jun 5, 2008 at 9:17 PM, Peeyush Singh [EMAIL PROTECTED]
  wrote:
 
  Hey guys, please excuse me in advance if I say or ask anything stupid
  :)
 
  Anyway, Solaris newbie here.  I've built for myself a new file server
  to
  use at home, in which I'm planning on configuring SXCE-89  ZFS.  It's
  a
  Supermicro C2SBX motherboard with a Core2Duo  4GB DDR3.  I have
  6x750GB
  SATA drives in it connected to the onboard ICH9-R controller (with BIOS
  RAID
  disabled  AHCI enabled).  I also have a 160GB SATA drive connected to
  a PCI
  SIIG SC-SA0012-S1 controller, the drive which will be used as the
  system
  drive.  My plan is to configure a RAID-Z2 pool on the 6x750 drives.
   The
  system drive is just there for Solaris.  I'm also out of ports to use
  on the
  motherboard, hence why I'm using an add-in PCI SATA controller.
 
  My problem is that Solaris is not recognizing the system drive during
  the
  DVD install procedure.  It sees the 6x750GB onboard drives fine.  I
  originally used a RocketRAID 1720 SATA controller, which uses its own
  HighPoint chipset I believe, and it was a no-go.  I went and exchanged
  that
  controller for a SIIG SC-SA0012-S1 controller, which I thought used a
  Silicon Integrated (SII) chipset.  The install DVD isn't recognizing it
  unfortunatly,  now I'm not so sure that it uses a SII chipset.  I
  checked
  the HCL, and it only lists a few cards that are reported to work under
  SXCE.
 
  If anyone has any suggestions on either...
  A) Using a different driver during the install procedure, or...
  B) A different, cheap SATA controller
 
  I'd appreciate it very much.  Sorry for the rambling post, but I wanted
  to
  be detailed from the get-go.  Thanks for any input! :)
 
  PS. On a side note, I'm interested in playing around with SXCE
  development.  It looks interesting :)
 
 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
  I'm still a fan of the marvell based supermicro card.  I run two of them
  in
  my fileserver.  AOC-SAT2-MV8
 
  http://www.supermicro.com/products/accessories/addon/AOC-SAT2-MV8.cfm
 

 I gave treatment to this question a few days ago. Yes, if you want
 PCI-X, go with the Marvell. If you want PCIe SATA, then its either a
 SIIG produced Si3124 card or a lot of guessing. I think the real
 winner is going to be the newer SAS/SATA mixed HBAs from LSI based on
 the 1068 chipset, which Sun has been supporting well in newer
 hardware.


 http://jmlittle.blogspot.com/2008/06/recommended-disk-controllers-for-zfs.html

 **pci or pci-x.  Yes, you might see *SOME* loss in speed from a pci
 interface, but let's be honest, there aren't a whole lot of users on this
 list that have the infrastructure to use greater than 100MB/sec who are
 asking this sort of question.  A PCI bus should have no issues pushing that.




 Equally important, don't mix SATA-I and SATA-II on that system
 motherboard, or on one of those add-on cards.

 http://jmlittle.blogspot.com/2008/05/mixing-sata-dos-and-donts.html


 I mix SATA-I and SATA-II and haven't had any issues to date.  Unless you
 have an official bug logged/linked, that's as good as a wives tail.

No bug to report, but it was one of the issues with losing my log
device a bit ago. ZFS engineers appear to be aware of it. Among other
things, its why there is a known work around to disable command
queueing (NCQ) on the marvell card when SATA-I drives are attached to
it.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SATA controller suggestion

2008-06-05 Thread Joe Little
On Thu, Jun 5, 2008 at 8:16 PM, Tim [EMAIL PROTECTED] wrote:


 On Thu, Jun 5, 2008 at 9:17 PM, Peeyush Singh [EMAIL PROTECTED]
 wrote:

 Hey guys, please excuse me in advance if I say or ask anything stupid :)

 Anyway, Solaris newbie here.  I've built for myself a new file server to
 use at home, in which I'm planning on configuring SXCE-89  ZFS.  It's a
 Supermicro C2SBX motherboard with a Core2Duo  4GB DDR3.  I have 6x750GB
 SATA drives in it connected to the onboard ICH9-R controller (with BIOS RAID
 disabled  AHCI enabled).  I also have a 160GB SATA drive connected to a PCI
 SIIG SC-SA0012-S1 controller, the drive which will be used as the system
 drive.  My plan is to configure a RAID-Z2 pool on the 6x750 drives.  The
 system drive is just there for Solaris.  I'm also out of ports to use on the
 motherboard, hence why I'm using an add-in PCI SATA controller.

 My problem is that Solaris is not recognizing the system drive during the
 DVD install procedure.  It sees the 6x750GB onboard drives fine.  I
 originally used a RocketRAID 1720 SATA controller, which uses its own
 HighPoint chipset I believe, and it was a no-go.  I went and exchanged that
 controller for a SIIG SC-SA0012-S1 controller, which I thought used a
 Silicon Integrated (SII) chipset.  The install DVD isn't recognizing it
 unfortunatly,  now I'm not so sure that it uses a SII chipset.  I checked
 the HCL, and it only lists a few cards that are reported to work under SXCE.

 If anyone has any suggestions on either...
 A) Using a different driver during the install procedure, or...
 B) A different, cheap SATA controller

 I'd appreciate it very much.  Sorry for the rambling post, but I wanted to
 be detailed from the get-go.  Thanks for any input! :)

 PS. On a side note, I'm interested in playing around with SXCE
 development.  It looks interesting :)


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 I'm still a fan of the marvell based supermicro card.  I run two of them in
 my fileserver.  AOC-SAT2-MV8

 http://www.supermicro.com/products/accessories/addon/AOC-SAT2-MV8.cfm


I gave treatment to this question a few days ago. Yes, if you want
PCI-X, go with the Marvell. If you want PCIe SATA, then its either a
SIIG produced Si3124 card or a lot of guessing. I think the real
winner is going to be the newer SAS/SATA mixed HBAs from LSI based on
the 1068 chipset, which Sun has been supporting well in newer
hardware.

http://jmlittle.blogspot.com/2008/06/recommended-disk-controllers-for-zfs.html

Equally important, don't mix SATA-I and SATA-II on that system
motherboard, or on one of those add-on cards.

http://jmlittle.blogspot.com/2008/05/mixing-sata-dos-and-donts.html



 It's the same chipset that's in the thumper, and it pretty cheap for an
 8-port card.

 --Tim

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot delete file when fs 100% full

2008-05-30 Thread Joe Little
On Fri, May 30, 2008 at 7:43 AM, Paul Raines [EMAIL PROTECTED] wrote:

 It seems when a zfs filesystem with reserv/quota is 100% full users can no
 longer even delete files to fix the situation getting errors like these:

 $ rm rh.pm6895.medial.V2.tif
 rm: cannot remove `rh.pm6895.medial.V2.tif': Disk quota exceeded

 (this is over NFS from a RHEL4 Linux box)

 I can log in as root on the Sun server and delete the file as root.
 After doing that, the user can then delete files okay.

 Is there anyway to workaround this does not involve root intervention?
 Users are filling up their volumes all the time which is the
 reason they must have reserv/quota set.

Well, with the Copy-on-right filesystem a delete actually requires a
write. That said, there have been certain religious arguments on the
list about whether the quota support presented by ZFS is sufficient.
In a nutshell, per user quotas are not implemented, and the suggested
workaround is the per-user filesystem with quota/reservations. Its
inelegant at best since the auto-mount definitions become their own
pain to maintain.

The other unimplemented feature is the soft and hard quota limits.
Most people have gotten around this by actually presenting only UFS
volumes held inside ZFS zvols to end users, but that defeats the
purpose of providing snapshots directly to end users, etc.

However, since snapshots are only available at the filesystem level,
you still are restricted to one filesystem per user to use snapshots
well, but I would argue hard/soft limits on the quota are the
unanswered problem that doesn't have a known workaround.




 --
 ---
 Paul Rainesemail: raines at nmr.mgh.harvard.edu
 MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
 149 (2301) 13th Street Charlestown, MA 02129USA


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog failure ... *ANY* way to recover?

2008-05-30 Thread Joe Little
On Fri, May 30, 2008 at 6:30 AM, Jeb Campbell [EMAIL PROTECTED] wrote:
 Ok, here is where I'm at:

 My install of OS 2008.05 (snv_86?) will not even come up in single user.

 The OS 2008.05 live cd comes up fine, but I can't import my old pool b/c of 
 the missing log (and I have to import to fix the log ...).

 So I think I'll boot from the live cd, import my rootpool, mount it, and copy 
 /rootpool/etc/zfs.cache to zfs.cache.save. Then I'll stick the zfs.cache from 
 the live cd onto the rootpool, update boot bits, and cross my fingers.

 The goal of this is to get my installed OS to finish a boot, then I can try 
 using the saved zfs.cache to load the degraded pool w/o import. As long as I 
 can get to it read-only, I'll copy what I can off.

 Any tips, comments, or suggestions would be welcome,


It seems we are in our own little echo chamber here. Well, these are
the bugs/resolutions that need to be addressed:

1) and l2arc or log device needs to evacuation-possible
2) any failure of a l2arc or log device should never prevent
importation of a pool. It is an additional device for cache/log
purposes, and failures of these devices should be correctly handled,
but not at the scope of failing the volume/losing already stored data.
Yes, this means that data in the intent log may be lost, but I'd
rather lose that 0.01% vs the whole volume
3) The failure to iterate filesystems bug is also quite annoying. If
any sub-FS in a zfs pool is not iteratable (data/home in my example),
the importation of the pool should note the error but proceed. Mounts
to data/proj and data/* except for data/home should continue. Again,
always attempt to do as much as possible to provide access to the data
that's still available. Giving up on all mounts because of a fault on
one is not reasonable behavior

There are more things here, and perhaps one can argue with the above.
However, my stance is being overly conservative to _DATA ACCESS_ --
faulting a pool until you can contact support and get a hack to
recover otherwise available data is just not good form :) You'll lose
customers left and right because of the cost of the downtime


 Jeb


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-help] 1TB ZFS thin provisioned partition prevents Opensolaris from booting.

2008-05-30 Thread Joe Little
On Fri, May 30, 2008 at 7:07 AM, Hugh Saunders [EMAIL PROTECTED] wrote:
 On Fri, May 30, 2008 at 10:37 AM, Akhilesh Mritunjai
 [EMAIL PROTECTED] wrote:
 I think it's right. You'd have to move to a 64 bit kernel. Any reasons to 
 stick to a 32 bit
 kernel ?

 My reason would be lack of 64bit hardware :(
 Is this an iscsi specific limitation? or will any multi-TB pool have
 problems on 32bit hardware?
 If so whats the upper bound to pool size on 32bit?


I've noticed its only a problem of per-LUN sizes on 32bit Solaris
clients trying to import them into ZFS. You can build a ZFS volume of
any* size as long as the underlying LUNs are less than 1.4TB each I
believe. Or so I've seen by experimentation. * is because the total
pool size and the the per-LUN size was simply something that worked
for me, but in the end, I did go w/ 64-bit processors as the memory
crunch that ZFS has makes 32bit unusable for any heavy use beyond .5TB
of disk, again by observation.


 --
 Hugh Saunders
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog failure ... *ANY* way to recover?

2008-05-29 Thread Joe Little
On Thu, May 29, 2008 at 7:25 PM, Jeb Campbell [EMAIL PROTECTED] wrote:
 Meant to add that zpool import -f pool doesn't work b/c of the missing log 
 vdev.

 All the other disks are there and show up with zpool import, but it won't 
 import.

 Is there anyway a util could clear the log device vdev from the remaining 
 raidz2 devices?

 Then I could import just a standard raidz2 pool.

 I really love zfs (and had recently upgraded to 6 disks in raidz2), but this 
 is *really* gonna hurt to lose all this stuff (yeah, the work stuff is backed 
 up, but I have/had tons of personal stuff on there).

 I definitely would prefer to just sit tight, and see if there is any way to 
 get this going (read only would be fine).


You can mount all those filesystems, and then zfs send/recv them off
to another box. Its sucks, but as of now, there is no re-importing of
the pool UNTIL the log can be removed. Sadly, I think that log removal
will at least require importation of the pool in question first. For
some reason you already can't import your pool.

In my case, I was running B70 and could import the pool still, but
just degraded. I think that once you are at a higher rev (which I do
not know, but inclusive of B82 and B85), you won't be able to import
it anymore when it fails.


 Jeb


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog failure ... *ANY* way to recover?

2008-05-29 Thread Joe Little
On Thu, May 29, 2008 at 8:59 PM, Joe Little [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 7:25 PM, Jeb Campbell [EMAIL PROTECTED] wrote:
 Meant to add that zpool import -f pool doesn't work b/c of the missing log 
 vdev.

 All the other disks are there and show up with zpool import, but it won't 
 import.

 Is there anyway a util could clear the log device vdev from the remaining 
 raidz2 devices?

 Then I could import just a standard raidz2 pool.

 I really love zfs (and had recently upgraded to 6 disks in raidz2), but this 
 is *really* gonna hurt to lose all this stuff (yeah, the work stuff is 
 backed up, but I have/had tons of personal stuff on there).

 I definitely would prefer to just sit tight, and see if there is any way to 
 get this going (read only would be fine).


More to the point, does it say there are any permanent errors that you
find? Again, I was able to import it after reassigning the log device
so it thinks its there. I got to this point:

[EMAIL PROTECTED]:~# zpool status -v
  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
dataONLINE   0 024
  raidz1ONLINE   0 024
c2t0d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
logsONLINE   0 024
  c3t1d0ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

data/home:0x0

Yes, because of the error I can no longer have any mounts created at
import, but the zfs mount data/proj or other filesystem, but not
data/home, is still possible. Again, I think that you will want to use
-o ro as an option to that mount command to not have the system go
bonkers. Check my blog for more info on reseting the log device for a
zfs replace action -- which itself puts you into more troubling
position of possibly having corruptions from the resilver, but at
least for me allowed me to mount the pool for read-only mounts of the
remaining filesystems.


 You can mount all those filesystems, and then zfs send/recv them off
 to another box. Its sucks, but as of now, there is no re-importing of
 the pool UNTIL the log can be removed. Sadly, I think that log removal
 will at least require importation of the pool in question first. For
 some reason you already can't import your pool.

 In my case, I was running B70 and could import the pool still, but
 just degraded. I think that once you are at a higher rev (which I do
 not know, but inclusive of B82 and B85), you won't be able to import
 it anymore when it fails.


 Jeb


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] slog devices don't resilver correctly

2008-05-27 Thread Joe Little
This past weekend, but holiday was ruined due to a log device
replacement gone awry.

I posted all about it here:

http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html

In a nutshell, an resilver of a single log device with itself, due to
the fact one can't remove a log device from a pool once defined, cause
ZFS to fully resilver but then attach the log device as as stripe to
the volume, and no longer as a log device. The subsequent pool failure
was exceptionally bad as the volume could no longer be imported and
required read-only mounting of the remaining filesystems that I could
to recover data. It would appear that log resilvers are broken, at
least up to B85. I haven't seen code changes in this space so I
presume this is likely an unaddressed problem.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] indiana as nfs server: crash due to zfs

2008-05-27 Thread Joe Little
On Mon, May 26, 2008 at 6:10 AM, Gerard Henry [EMAIL PROTECTED] wrote:
 hello all,
 i have indiana freshly installed on a sun ultra 20 machine. It only does nfs 
 server. During one night, the kernel had crashed, and i got this messages:
 
 May 22 02:18:57 ultra20 unix: [ID 836849 kern.notice]
 May 22 02:18:57 ultra20 ^Mpanic[cpu0]/thread=ff0003d06c80:
 May 22 02:18:57 ultra20 genunix: [ID 603766 kern.notice] assertion failed: 
 sm-sm_space == 0 (0x4000 == 0x0), file: ../../common/fs/zfs/space_map.c, 
 line: 315
 May 22 02:18:57 ultra20 unix: [ID 10 kern.notice]
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06830 
 genunix:assfail3+b9 ()
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d068e0 
 zfs:space_map_load+2c2 ()
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06920 
 zfs:metaslab_activate+66 ()
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d069e0 
 zfs:metaslab_group_alloc+24e ()
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06ab0 
 zfs:metaslab_alloc_dva+1da ()
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06b50 
 zfs:metaslab_alloc+82 ()
 May 22 02:18:57 ultra20 genunix: [ID 655072 kern.notice] ff0003d06ba0 
 zfs:zio_dva_allocate+62 ()
 

 Searching on the net, it seems that this kind of error is usual, does it mean 
 that i can't use indiana as a robust nfs server?
 What actions can i do if i want to investigate?

I've seen many people trying to use (most cases successfully) Indiana
or some OpenSolaris build for quasi-production NFS or similar
servicing. I think if you want robust, go with something that is
targeted for robustness for your case, such as NexentaStor (paid or
free editions). I come off as a shill for a solution that I use, but
it amazes me that people ask for a robust stable-tracking solutions
but always track the bleeding edge instead. Nothing wrong with that,
and I do the same, but I know that's what its for :)




 thanks in advance,

 gerard


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog devices don't resilver correctly

2008-05-27 Thread Joe Little
On Tue, May 27, 2008 at 1:50 PM, Eric Schrock [EMAIL PROTECTED] wrote:
 Yeah, I noticed this the other day while I was working on an unrelated
 problem.  The basic problem is that log devices are kept within the
 normal vdev tree, and are only distinguished by a bit indicating that
 they are log devices (and is the source for a number of other
 inconsistencies that Pwel has encountered).

 When doing a replacement, the userland code is responsible for creating
 the vdev configuration to use for the newly attached vdev.  In this
 case, it doesn't preserve the 'is_log' bit correctly.  This should be
 enforced in the kernel - it doesn't make sense to replace a log device
 with a non-log device, ever.

 I have a workspace with some other random ZFS changes, so I'll try to
 include this as well.

 FWIW, removing log devices is significantly easier than removing
 arbitrary devices, since there is no data to migrate (after the current
 txg is synced).  At one point there were plans to do this as a separate
 piece of work (since the vdev changes are needed for the general case
 anyway), but I don't know whether this is still the case.


Thanks for the reply. As noted, I do recommend against the log device
as you can't remove it and the replacement as you see is touchy at
best. I know the larger, but general vdev evacuation is ongoing, but
if it is simple, log evacuation would make logs useful now instead of
waiting.



 - Eric

 On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote:
 This past weekend, but holiday was ruined due to a log device
 replacement gone awry.

 I posted all about it here:

 http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html

 In a nutshell, an resilver of a single log device with itself, due to
 the fact one can't remove a log device from a pool once defined, cause
 ZFS to fully resilver but then attach the log device as as stripe to
 the volume, and no longer as a log device. The subsequent pool failure
 was exceptionally bad as the volume could no longer be imported and
 required read-only mounting of the remaining filesystems that I could
 to recover data. It would appear that log resilvers are broken, at
 least up to B85. I haven't seen code changes in this space so I
 presume this is likely an unaddressed problem.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog devices don't resilver correctly

2008-05-27 Thread Joe Little
On Tue, May 27, 2008 at 4:50 PM, Eric Schrock [EMAIL PROTECTED] wrote:
 Joe -

 We definitely don't do great accounting of the 'vdev_islog' state here,
 and it's possible to create a situation where the parent replacing vdev
 has the state set but the children do not, but I have been unable to
 reproduce the behavior you saw.  I have rebooted the system during
 resilver, manually detached the replacing vdev, and a variety of other
 things, but I've never seen the behavior you describe.  In all cases,
 the log state is kept with the replacing vdev and restored when the
 resilver completes.  I have also not observed the resilver failing with
 a bad log device.

 Can you provide more information about how to reproduce this problem?
 Perhaps without rebooting into B70 in the middle?


Well, this happened live on a production system, and I'm still in the
process of rebuilding said system (trying to save all the snapshots)

I don't know what triggered it. It was trying to resilver in B85,
rebooted into B70 where it did resilver (but it was now using cmdk
device naming vs the full scsi device names). It was marked degraded
still even though re-silvering finished. Since the resilver took so
long, I suspect the splicing in of the device took place in the B70.
Again, it would never work in B85 -- just kept resetting. I'm
wondering if the device path changing from cxtxdx to cxdx could be the
trigger point.


 Thanks,

 - Eric

 On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote:
 Yeah, I noticed this the other day while I was working on an unrelated
 problem.  The basic problem is that log devices are kept within the
 normal vdev tree, and are only distinguished by a bit indicating that
 they are log devices (and is the source for a number of other
 inconsistencies that Pwel has encountered).

 When doing a replacement, the userland code is responsible for creating
 the vdev configuration to use for the newly attached vdev.  In this
 case, it doesn't preserve the 'is_log' bit correctly.  This should be
 enforced in the kernel - it doesn't make sense to replace a log device
 with a non-log device, ever.

 I have a workspace with some other random ZFS changes, so I'll try to
 include this as well.

 FWIW, removing log devices is significantly easier than removing
 arbitrary devices, since there is no data to migrate (after the current
 txg is synced).  At one point there were plans to do this as a separate
 piece of work (since the vdev changes are needed for the general case
 anyway), but I don't know whether this is still the case.

 - Eric

 On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote:
  This past weekend, but holiday was ruined due to a log device
  replacement gone awry.
 
  I posted all about it here:
 
  http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html
 
  In a nutshell, an resilver of a single log device with itself, due to
  the fact one can't remove a log device from a pool once defined, cause
  ZFS to fully resilver but then attach the log device as as stripe to
  the volume, and no longer as a log device. The subsequent pool failure
  was exceptionally bad as the volume could no longer be imported and
  required read-only mounting of the remaining filesystems that I could
  to recover data. It would appear that log resilvers are broken, at
  least up to B85. I haven't seen code changes in this space so I
  presume this is likely an unaddressed problem.
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs mount i/o error and workarounds

2008-04-17 Thread Joe Little
Hello list,

We discovered a failed disk with checksum errors. Took out the disk
and resilvered, which reported many errors. A few of my subvolumes to
the pool won't mount anymore, with zfs import poolname reporting
that cannot mount 'poolname/proj': I/O error

Ok, we have a problem. I can successfully clone any snapshot of 'proj'
to get it mounted, and it looks like all snapshots are intact. This is
all just backups, so I want to find a way to mount this filesystem
keeping that snapshots with it. Any recipe of how one does this? Do I
need to zfs send/recv to myself to another name, delete the old, and
rename? Is there any other way? There are large filesystems and at
least one is greater than my available space.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How many ZFS pools is it sensible to use on a single server?

2008-04-12 Thread Joe Little
On Tue, Apr 8, 2008 at 9:55 AM,  [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote on 04/08/2008 11:22:53 AM:


In our environment, the politically and administratively simplest
   approach to managing our storage is to give each separate group at
   least one ZFS pool of their own (into which they will put their various
   filesystems). This could lead to a proliferation of ZFS pools on our
   fileservers (my current guess is at least 50 pools and perhaps up to
   several hundred), which leaves us wondering how well ZFS handles this
   many pools.
  
So: is ZFS happy with, say, 200 pools on a single server? Are there any
   issues (slow startup, say, or peculiar IO performance) that we'll run
   into? Has anyone done this in production? If there are issues, is there
   any sense of what the recommended largest number of pools per server is?
  

  Chris,

   Well,  I have done testing with filesystems and not as much with
  pools -- I believe the core design premise for zfs is that administrators
  would use few pools and many filesystems.  I would think that Sun would
  recommend that you make a large pool (or a few) and divvy out filesystem
  with reservations to the groups (to which they can add sub filesystems).
  As far as ZFS filesystems are concerned my testing has shown that the mount
  time and io overhead for multiple filesystems seems to be pretty linear --
  timing 10 mounts translates pretty well to 100 and 1000.  After you hit
  some level (depending on processor and memory) the mount time, io and
  write/read batching spikes up pretty heavily.  This is one of the reasons I
  take a strong stance against the recommendation that people use
  reservations and filesystems as user/group quotas (ignoring that the
  functionality is not by any means in parity.)


Not to beat a dead horse too much, the lack of quotas and the mount
limits either of the clients or the time per filesystem mentioned
above allows us to heavily utilize ZFS for second tier, where quotas
can be at a logical group level, and not first tier use which still
demands per user quotas. Its unmet requirement.

As to your original question, with enough LUN carving you can
artificially create many pools. However, ease of management and
focusing on both performance and reliability suggest one put as many
drives in a redundant config in as few a pools as possible, split up
your disk use among top level ZFS filesystems to each group, and then
let them divy up ZFS filesystems with further embedded ZFS file
systems.

  -Wade





  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] odd slog behavior on B70

2007-11-26 Thread Joe Little
I was playing with a Gigabyte i-RAM card and found out it works great
to improve overall performance when there are a lot of writes of small
files over NFS to such a ZFS pool.

However, I noted a frequent situation in periods of long writes over
NFS of small files. Here's a snippet of iostat during that period.
sd15/sd16 are two iscsi targets, and sd17 is the iRAM card (2GB)

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
sd17  0.00.00.00.0  3.0  1.00.0 100 100

During this time no operations can occur. I've attached the iRAM disk
via a 3124 card. I've never seen a svc_t time of 0, and full wait and
busy disk. Any clue what this might mean?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] odd slog behavior on B70

2007-11-26 Thread Joe Little
On Nov 26, 2007 7:00 PM, Richard Elling [EMAIL PROTECTED] wrote:
 I would expect such iostat output from a device which can handle
 only a single queued I/O to the device (eg. IDE driver) and an I/O
 is stuck.  There are 3 more I/Os in the wait queue waiting for the
 active I/O to complete.  The %w and %b are measured as the percent
 of time during which an I/O was in queue.  The svc_t is 0 because
 the I/O is not finished.

 By default, most of the drivers will retry I/Os which don't seem to
 finish, but the retry interval is often on the order of 60 seconds.
 If a retry succeeds, then no message is logged to syslog, so you
 might not see any messages.  But just to be sure, what does
 fmdump (and fmdump -e) say about the system?  Are messages
 logged in /var/adm/messages?

nothing with fmdump or /var/adm/messages. Your answer explains why its
60 seconds or so. What's sad is that this is a ramdisk so to speak,
albeit connected via SATA-I to the sil3124. Any way to isolate this
further? Anyway to limit i/o timeouts to a drive? this is just two
sticks of ram.. ms would be fine :)

  -- richard


 Joe Little wrote:
  I was playing with a Gigabyte i-RAM card and found out it works great
  to improve overall performance when there are a lot of writes of small
  files over NFS to such a ZFS pool.
 
  However, I noted a frequent situation in periods of long writes over
  NFS of small files. Here's a snippet of iostat during that period.
  sd15/sd16 are two iscsi targets, and sd17 is the iRAM card (2GB)
 
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
  sd17  0.00.00.00.0  3.0  1.00.0 100 100
 
  During this time no operations can occur. I've attached the iRAM disk
  via a 3124 card. I've never seen a svc_t time of 0, and full wait and
  busy disk. Any clue what this might mean?
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] odd slog behavior on B70

2007-11-26 Thread Joe Little
On Nov 26, 2007 7:57 PM, Richard Elling [EMAIL PROTECTED] wrote:
 Joe Little wrote:
  On Nov 26, 2007 7:00 PM, Richard Elling [EMAIL PROTECTED] wrote:
 
  I would expect such iostat output from a device which can handle
  only a single queued I/O to the device (eg. IDE driver) and an I/O
  is stuck.  There are 3 more I/Os in the wait queue waiting for the
  active I/O to complete.  The %w and %b are measured as the percent
  of time during which an I/O was in queue.  The svc_t is 0 because
  the I/O is not finished.
 
  By default, most of the drivers will retry I/Os which don't seem to
  finish, but the retry interval is often on the order of 60 seconds.
  If a retry succeeds, then no message is logged to syslog, so you
  might not see any messages.  But just to be sure, what does
  fmdump (and fmdump -e) say about the system?  Are messages
  logged in /var/adm/messages?
 
 
  nothing with fmdump or /var/adm/messages. Your answer explains why its
  60 seconds or so. What's sad is that this is a ramdisk so to speak,
  albeit connected via SATA-I to the sil3124. Any way to isolate this
  further? Anyway to limit i/o timeouts to a drive? this is just two
  sticks of ram.. ms would be fine :)
 

 I suspect a bug in the driver or firmware.  It might be difficult to
 identify
 if it is in the firmware.

 A pretty good white paper on storage stack timeout tuning is available
 at BigAdmin:
   http://www.sun.com/bigadmin/features/hub_articles/tuning_sfs.jsp
 But it won't directly apply to your case because you aren't using the
 ssd driver.  I'd wager the cmdk driver is being used for your case
 and I'm not familiar with its internals.  prtconf -D will show which
 driver(s) are in use.
  -- richard


The previous message listed the sil bug. Its the sil driver, and thus
sd* (as seen from the iostat)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz DEGRADED state

2007-11-20 Thread Joe Little
On Nov 20, 2007 6:34 AM, MC [EMAIL PROTECTED] wrote:
  So there is no current way to specify the creation of
  a 3 disk raid-z
  array with a known missing disk?

 Can someone answer that?  Or does the zpool command NOT accommodate the 
 creation of a degraded raidz array?


can't started degraded, but you can make it so..

If one can make a sparse file, then you'd be set. Just create the
file, make a zpool out of the two disks and the file, and then drop
the file from the pool _BEFORE_ copying over the data. I believe then
you can add the third disk as a replacement. The gotcha (and why the
sparse may be needed) is that it will only use per disk the size of
the smallest disk.



 This message posted from opensolaris.org
 ___

 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-18 Thread Joe Little
On Nov 18, 2007 1:44 PM, Richard Elling [EMAIL PROTECTED] wrote:
 one more thing...


 Joe Little wrote:
  I have historically noticed that in ZFS, when ever there is a heavy
  writer to a pool via NFS, the reads can held back (basically paused).
  An example is a RAID10 pool of 6 disks, whereby a directory of files
  including some large 100+MB in size being written can cause other
  clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
  I've gotten used to this behavior over NFS, but didn't see it perform
  as such when on the server itself doing similar actions.
 
  To improve upon the situation, I thought perhaps I could dedicate a
  log device outside the pool, in the hopes that while heavy writes went
  to the log device, reads would merrily be allowed to coexist from the
  pool itself. My test case isn't ideal per se, but I added a local 9GB
  SCSI (80) drive for a log, and added to LUNs for the pool itself.
  You'll see from the below that while the log device is pegged at
  15MB/sec (sd5),  my directory list request on devices sd15 and sd16
  never are answered. I tried this with both no-cache-flush enabled and
  off, with negligible difference. Is there anyway to force a better
  balance of reads/writes during heavy writes?
 
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  fd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd1   0.00.00.00.0  0.0  0.00.0   0   0
  sd2   0.00.00.00.0  0.0  0.00.0   0   0
  sd3   0.00.00.00.0  0.0  0.00.0   0   0
  sd4   0.00.00.00.0  0.0  0.00.0   0   0
  sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100

 When you see actv = 35 and svc_t  ~20, then it is possible that
 you can improve performance by reducing the zfs_vdev_max_pending
 queue depth.  See
 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 This will be particularly true for JBODs.

 Doing a little math, there is ~ 4.5 MBytes queued in the drive
 waiting to be written.  4.5 MBytes isn't much for a typical RAID
 array, but for a disk, it is often a sizeable chunk of its
 available cache.  A 9 GByte disk, being rather old, has a pretty
 wimpy microprocessor, so you are basically beating the poor thing
 senseless.  Reducing the queue depth will allow the disk to perform
 more efficiently.

I'll be trying an 18G 10K drive tomorrow. Again the test was simply to
see if by having a slog, I'd enable NFS to allow for concurrent reads
and writes. Especially in the iscsi case, but even in jbod, I find
_any_ heavy writing to completely postpone reads to NFS clients. This
makes ZFS and NFS impractical under i/o duress. My just was to simply
see how things work. It appears from Neil that it won't, and the
synchronicity RFE per ZFS filesystem is what is needed, or at least
zil_disable for NFS to be practically used currently.

As for the max_pending, I did try to lower that w/o any success (for
values of 10 and 20) in a JBOD.


   -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-17 Thread Joe Little
On Nov 16, 2007 10:41 PM, Neil Perrin [EMAIL PROTECTED] wrote:


 Joe Little wrote:
  On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
  Joe,
 
  I don't think adding a slog helped in this case. In fact I
  believe it made performance worse. Previously the ZIL would be
  spread out over all devices but now all synchronous traffic
  is directed at one device (and everything is synchronous in NFS).
  Mind you 15MB/s seems a bit on the slow side - especially is
  cache flushing is disabled.
 
  It would be interesting to see what all the threads are waiting
  on. I think the problem maybe that everything is backed
  up waiting to start a transaction because the txg train is
  slow due to NFS requiring the ZIL to push everything synchronously.
 
 
  I agree completely. The log (even though slow) was an attempt to
  isolate writes away from the pool. I guess the question is how to
  provide for async access for NFS. We may have 16, 32 or whatever
  threads, but if a single writer keeps the ZIL pegged and prohibiting
  reads, its all for nought. Is there anyway to tune/configure the
  ZFS/NFS combination to balance reads/writes to not starve one for the
  other. Its either feast or famine or so tests have shown.

 No there's no way currently to give reads preference over writes.
 All transactions get equal priority to enter a transaction group.
 Three txgs can be outstanding as we use a 3 phase commit model:
 open; quiescing; and syncing.


anyway to improve the balance? Is would appear that zil_disable is
still a requirement to get NFS to behave in an practical real world
way with ZFS still. Even with zil_disable, we end up with periods of
pausing on the heaviest of writes, and then I think its mostly just
ZFS having too much outstanding i/o to commit.

If zil_disable is enabled, is the slog disk ignored?

 Neil.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
I have historically noticed that in ZFS, when ever there is a heavy
writer to a pool via NFS, the reads can held back (basically paused).
An example is a RAID10 pool of 6 disks, whereby a directory of files
including some large 100+MB in size being written can cause other
clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
I've gotten used to this behavior over NFS, but didn't see it perform
as such when on the server itself doing similar actions.

To improve upon the situation, I thought perhaps I could dedicate a
log device outside the pool, in the hopes that while heavy writes went
to the log device, reads would merrily be allowed to coexist from the
pool itself. My test case isn't ideal per se, but I added a local 9GB
SCSI (80) drive for a log, and added to LUNs for the pool itself.
You'll see from the below that while the log device is pegged at
15MB/sec (sd5),  my directory list request on devices sd15 and sd16
never are answered. I tried this with both no-cache-flush enabled and
off, with negligible difference. Is there anyway to force a better
balance of reads/writes during heavy writes?

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  117.00.0 14970.1  0.0 35.0  299.2   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  118.10.0 15111.9  0.0 35.0  296.4   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.0 

Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
 Joe,

 I don't think adding a slog helped in this case. In fact I
 believe it made performance worse. Previously the ZIL would be
 spread out over all devices but now all synchronous traffic
 is directed at one device (and everything is synchronous in NFS).
 Mind you 15MB/s seems a bit on the slow side - especially is
 cache flushing is disabled.

 It would be interesting to see what all the threads are waiting
 on. I think the problem maybe that everything is backed
 up waiting to start a transaction because the txg train is
 slow due to NFS requiring the ZIL to push everything synchronously.


I agree completely. The log (even though slow) was an attempt to
isolate writes away from the pool. I guess the question is how to
provide for async access for NFS. We may have 16, 32 or whatever
threads, but if a single writer keeps the ZIL pegged and prohibiting
reads, its all for nought. Is there anyway to tune/configure the
ZFS/NFS combination to balance reads/writes to not starve one for the
other. Its either feast or famine or so tests have shown.

 Neil.


 Joe Little wrote:
  I have historically noticed that in ZFS, when ever there is a heavy
  writer to a pool via NFS, the reads can held back (basically paused).
  An example is a RAID10 pool of 6 disks, whereby a directory of files
  including some large 100+MB in size being written can cause other
  clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
  I've gotten used to this behavior over NFS, but didn't see it perform
  as such when on the server itself doing similar actions.
 
  To improve upon the situation, I thought perhaps I could dedicate a
  log device outside the pool, in the hopes that while heavy writes went
  to the log device, reads would merrily be allowed to coexist from the
  pool itself. My test case isn't ideal per se, but I added a local 9GB
  SCSI (80) drive for a log, and added to LUNs for the pool itself.
  You'll see from the below that while the log device is pegged at
  15MB/sec (sd5),  my directory list request on devices sd15 and sd16
  never are answered. I tried this with both no-cache-flush enabled and
  off, with negligible difference. Is there anyway to force a better
  balance of reads/writes during heavy writes?
 
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  fd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd1   0.00.00.00.0  0.0  0.00.0   0   0
  sd2   0.00.00.00.0  0.0  0.00.0   0   0
  sd3   0.00.00.00.0  0.0  0.00.0   0   0
  sd4   0.00.00.00.0  0.0  0.00.0   0   0
  sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
  sd6   0.00.00.00.0  0.0  0.00.0   0   0
  sd7   0.00.00.00.0  0.0  0.00.0   0   0
  sd8   0.00.00.00.0  0.0  0.00.0   0   0
  sd9   0.00.00.00.0  0.0  0.00.0   0   0
  sd10  0.00.00.00.0  0.0  0.00.0   0   0
  sd11  0.00.00.00.0  0.0  0.00.0   0   0
  sd12  0.00.00.00.0  0.0  0.00.0   0   0
  sd13  0.00.00.00.0  0.0  0.00.0   0   0
  sd14  0.00.00.00.0  0.0  0.00.0   0   0
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
 ...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
On Nov 16, 2007 9:17 PM, Joe Little [EMAIL PROTECTED] wrote:
 On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
  Joe,
 
  I don't think adding a slog helped in this case. In fact I
  believe it made performance worse. Previously the ZIL would be
  spread out over all devices but now all synchronous traffic
  is directed at one device (and everything is synchronous in NFS).
  Mind you 15MB/s seems a bit on the slow side - especially is
  cache flushing is disabled.
 
  It would be interesting to see what all the threads are waiting
  on. I think the problem maybe that everything is backed
  up waiting to start a transaction because the txg train is
  slow due to NFS requiring the ZIL to push everything synchronously.
 

Roch wrote this before (thus my interest in the log or NVRAM like solution):


There are 2 independant things at play here.

a) NFS sync semantics conspire againts single thread performance with
any backend filesystem.
 However NVRAM normally offers some releaf of the issue.

b) ZFS sync semantics along with the Storage Software + imprecise
protocol in between, conspire againts ZFS performance
of some workloads on NVRAM backed storage. NFS being one of the
affected workloads.

The conjunction of the 2 causes worst than expected NFS perfomance
over ZFS backend running __on NVRAM back storage__.
If you are not considering NVRAM storage, then I know of no ZFS/NFS
specific problems.

Issue b) is being delt with, by both Solaris and Storage Vendors (we
need a refined protocol);

Issue a) is not related to ZFS and rather fundamental NFS issue.
Maybe future NFS protocol will help.


Net net; if one finds a way to 'disable cache flushing' on the
storage side, then one reaches the state
we'll be, out of the box, when b) is implemented by Solaris _and_
Storage vendor. At that point,  ZFS becomes a fine NFS
server not only on JBOD as it is today , both also on NVRAM backed
storage.

It's complex enough, I thougt it was worth repeating.




 I agree completely. The log (even though slow) was an attempt to
 isolate writes away from the pool. I guess the question is how to
 provide for async access for NFS. We may have 16, 32 or whatever
 threads, but if a single writer keeps the ZIL pegged and prohibiting
 reads, its all for nought. Is there anyway to tune/configure the
 ZFS/NFS combination to balance reads/writes to not starve one for the
 other. Its either feast or famine or so tests have shown.


  Neil.
 
 
  Joe Little wrote:
   I have historically noticed that in ZFS, when ever there is a heavy
   writer to a pool via NFS, the reads can held back (basically paused).
   An example is a RAID10 pool of 6 disks, whereby a directory of files
   including some large 100+MB in size being written can cause other
   clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
   I've gotten used to this behavior over NFS, but didn't see it perform
   as such when on the server itself doing similar actions.
  
   To improve upon the situation, I thought perhaps I could dedicate a
   log device outside the pool, in the hopes that while heavy writes went
   to the log device, reads would merrily be allowed to coexist from the
   pool itself. My test case isn't ideal per se, but I added a local 9GB
   SCSI (80) drive for a log, and added to LUNs for the pool itself.
   You'll see from the below that while the log device is pegged at
   15MB/sec (sd5),  my directory list request on devices sd15 and sd16
   never are answered. I tried this with both no-cache-flush enabled and
   off, with negligible difference. Is there anyway to force a better
   balance of reads/writes during heavy writes?
  
extended device statistics
   devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
   fd0   0.00.00.00.0  0.0  0.00.0   0   0
   sd0   0.00.00.00.0  0.0  0.00.0   0   0
   sd1   0.00.00.00.0  0.0  0.00.0   0   0
   sd2   0.00.00.00.0  0.0  0.00.0   0   0
   sd3   0.00.00.00.0  0.0  0.00.0   0   0
   sd4   0.00.00.00.0  0.0  0.00.0   0   0
   sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
   sd6   0.00.00.00.0  0.0  0.00.0   0   0
   sd7   0.00.00.00.0  0.0  0.00.0   0   0
   sd8   0.00.00.00.0  0.0  0.00.0   0   0
   sd9   0.00.00.00.0  0.0  0.00.0   0   0
   sd10  0.00.00.00.0  0.0  0.00.0   0   0
   sd11  0.00.00.00.0  0.0  0.00.0   0   0
   sd12  0.00.00.00.0  0.0  0.00.0   0   0
   sd13  0.00.00.00.0  0.0  0.00.0   0   0
   sd14  0.00.00.00.0  0.0  0.00.0   0   0
   sd15  0.00.00.00.0  0.0  0.00.0   0   0
   sd16  0.00.00.00.0  0.0  0.00.0   0   0
  ...
 

___
zfs-discuss

Re: [zfs-discuss] first public offering of NexentaStor

2007-11-07 Thread Joe Little
Not for NexentaStor as yet to my knowledge. I'd like to caution that
the target of the initial product release is digital
archiving/tiering/etc and is not necessarily primary NAS usage, though
it can be used as such for those so inclined. However, interested
parties should contact them as they flesh out those details. BETA
programs are available too (that's where I'm at now)


On 11/6/07, roland [EMAIL PROTECTED] wrote:
 is there any pricing information available ?


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] first public offering of NexentaStor

2007-11-02 Thread Joe Little
On 11/2/07, MC [EMAIL PROTECTED] wrote:
  I consider myself an early adopter of ZFS and pushed
  it hard on this
  list and in real life with regards to iSCSI
  integration, zfs
  performance issues with latency there of, and how
  best to use it with
  NFS. Well, I finally get to talk more about the
  ZFS-based product I've
  been beta testing for quite some time. I thought this
  was the most
  appropriate place to make it known that NexentaStor
  is now out, and
  you can read more of my take at my personal post,
  http://jmlittle.blogspot.com/2007/11/coming-out-party-
  for-commodity-storage.html
 
  I thought it would be in the normal opensolaris blog
  listing, but
  since its not showing up there, this single list
  seems most
  appropriate to get interested parties and feedback.
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discu
  ss

 Hmm so is that where all the Nexenta guys have been all this time!?!? :)

 I look forward to trying out what has been produced.  This type of solution 
 is a pleasing one for the consumer.

 Is there a list of the contributers and what they do?  The landscape of 
 Nexenta has changed and I wonder about the details.  PS: the website looks 
 kind of busy to the eyes :)

 PPS: I think the new Nexenta team is the perfect candidate for submitting to 
 the community how they think the OpenSolaris branding and compatibility 
 should work.  Would you like a Built with OpenSolaris logo to use?  How far 
 would you (or should you) go to maintain compatibility and be certified as 
 OpenSolaris Compatible?


I can only speak up to my particular usage and understanding. Its
OpenSolaris-based in the sense it is based on the ON/NWS
consolidations (aka, NexentaOS or the NCP releases). Its still very
much Debian/Ubuntu like in that it has that packaging, that installer,
etc. Time will tell how compatible that is deemed to be.

 People doing real work on real projects should chime on on those issues 
 because there is far too much yapping from people like me who do nothing :)


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backport of vfs_zfsacl.c to samba 3.0.26a, [and NexentaStor]

2007-11-02 Thread Joe Little
On 11/2/07, Rob Logan [EMAIL PROTECTED] wrote:

 I'm confused by this and NexentaStor... wouldn't it be better
 to use b77? with:

 Heads Up: File system framework changes (supplement to CIFS' head's up)
 Heads Up: Flag Day (Addendum) (CIFS Service)
 Heads Up: Flag Day (CIFS Service)
 caller_context_t in all VOPs - PSARC/2007/218
 VFS Feature Registration and ACL on Create - PSARC/2007/227
 ZFS Case-insensitive support - PSARC/2007/244
 Extensible Attribute Interfaces - PSARC/2007/315
 ls(1) new command line options '-/' and '-%': CIFS system attributes support 
 - PSARC/2007/394
 Modified Access Checks for CIFS - PSARC/2007/403
 Add system attribute support to chmod(1) - PSARC/2007/410
 CIFS system attributes support for cp(1), pack(1), unpack(1), compress(1) and 
 uncompress(1) - PSARC/2007/432
 Rescind SETTABLE Attribute - PSARC/2007/444
 CIFS system attributes support for cpio(1), pax(1), tar(1) - PSARC/2007/459
 Update utilities to match CIFS system attributes changes. - PSARC/2007/546
 ZFS sharesmb property - PSARC/2007/560
 VFS Feature Registration and ACL on Create - PSARC/2007/227
 Extensible Attribute Interfaces - PSARC/2007/315
 Extensible Attribute Interfaces - PSARC/2007/315
 Extensible Attribute Interfaces - PSARC/2007/315
 Extensible Attribute Interfaces - PSARC/2007/315
 CIFS Service - PSARC/2006/715

It doesn't yet have anything to do with NexentaStor per se. I know
that CIFS service support in the BETA is preliminary, and the timing
of the availability makes a CIFS service tied to ZFS and its share
commands much more attractive. Depending on its maturity, I hope
Nexenta folk will have it included in their final release if not
somewhere on their roadmap.





 http://www.opensolaris.org/os/community/on/flag-days/all/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] first public offering of NexentaStor

2007-11-01 Thread Joe Little
I consider myself an early adopter of ZFS and pushed it hard on this
list and in real life with regards to iSCSI integration, zfs
performance issues with latency there of, and how best to use it with
NFS. Well, I finally get to talk more about the ZFS-based product I've
been beta testing for quite some time. I thought this was the most
appropriate place to make it known that NexentaStor is now out, and
you can read more of my take at my personal post,
http://jmlittle.blogspot.com/2007/11/coming-out-party-for-commodity-storage.html

I thought it would be in the normal opensolaris blog listing, but
since its not showing up there, this single list seems most
appropriate to get interested parties and feedback.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announcing NexentaCP(b65) with ZFS/Boot integrated installer

2007-06-07 Thread Joe Little

On 6/7/07, Al Hopper [EMAIL PROTECTED] wrote:

On Wed, 6 Jun 2007, Erast Benson wrote:

 Announcing new direction of Open Source NexentaOS development:
 NexentaCP (Nexenta Core Platform).

 NexentaCP is Dapper/LTS-based core Operating System Platform distributed
 as a single-CD ISO, integrates Installer/ON/NWS/Debian and provides
 basis for Network-type installations via main or third-party APTs (NEW).

 First unstable b65-based ISO with ZFS/Boot-capable installer available
 as usual at:
 http://www.gnusolaris.org/unstable-iso/ncp_beta1-test1-b65_i386.iso
... snip 

Now also available on www.genunix.org



And mirrored at:

http://mirror.stanford.edu/gnusolaris/isos/ncp_beta1-test1-b65_i386.iso


Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: .zfs snapshot directory in all directories

2007-02-28 Thread Joe Little

On 2/27/07, Eric Haycraft [EMAIL PROTECTED] wrote:

I am no scripting pro, but I would imagine it would be fairly simple to create 
a script and batch it to make symlinks in all subdirectories.




I've done something similar using NFS aggregation products. The real
problem is when you export, especially via CIFS (SMB) from a given
directory. Let's take a given example of a division based file tree. A
given area of the company, say marketing, has multiple sub folders:

/pool/marketing, /pool/marketing/docs, /pool/marketing/projects,
/pool/marketing/users

Well, Marketing wants Windows access, so you allow shares at any
point, including at /pool/marketing/users. Well, symlinks don't help,
and a snapshot mechanism needs to be there at the users subdirectory
level.

Some would argue to promote /pool/marketing/users into a ZFS
filesystem. Well, the other problem arises, in that at least with NFS,
you need to share per filesystem and clients must multiple mount the
filesystem (/pool/marketing, /pool/marketing/users,
/pool/marketing/docs, etc). Mounting /pool/marketing alone will show
you empty directories for users, projects, etc if further mounting
doesn't exist.

Yeah.. automounts, nfsv4, blah blah :) A lot of setup when all you
need is pervasive .snapshot trees similar to NetApp. I just hope
that don't have a bloody patent on something as simple as that to
solve this.



This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs corruption -- odd inum?

2007-02-11 Thread Joe Little

On 2/11/07, Jeff Bonwick [EMAIL PROTECTED] wrote:

The object number is in hex.  21e282 hex is 2220674 decimal --
give that a whirl.

This is all better now thanks to some recent work by Eric Kustarz:

6410433 'zpool status -v' would be more useful with filenames

This was integrated into Nevada build 57.

Jeff

On Sat, Feb 10, 2007 at 05:18:05PM -0800, Joe Little wrote:
 So, I attempting to find the inode from the result of a zpool status -v:

 errors: The following persistent errors have been detected:

  DATASET  OBJECT  RANGE
  cc   21e382  lvl=0 blkid=0


 Well, 21e282 appears to not be a valid number for find . -inum blah

 Any suggestions?


Ok.. but using the hex as suggested gave me an odder error result that
I can't parse..

zdb -vvv tier2 0x21e382
   version=3
   name='tier2'
   state=0
   txg=353444
   pool_guid=3320175367383032945
   vdev_tree
   type='root'
   id=0
   guid=3320175367383032945
   children[0]
   type='disk'
   id=0
   guid=1858965616559880189
   path='/dev/dsk/c3t4d0s0'

devid='id1,[EMAIL PROTECTED]/a'
   whole_disk=1
   metaslab_array=16
   metaslab_shift=33
   ashift=9
   asize=1500336095232
   children[1]
   type='disk'
   id=1
   guid=2406851811694064278
   path='/dev/dsk/c3t5d0s0'

devid='id1,[EMAIL PROTECTED]/a'
   whole_disk=1
   metaslab_array=13
   metaslab_shift=33
   ashift=9
   asize=1500336095232
   children[2]
   type='disk'
   id=2
   guid=4840324923103758504
   path='/dev/dsk/c3t6d0s0'

devid='id1,[EMAIL PROTECTED]/a'
   whole_disk=1
   metaslab_array=4408
   metaslab_shift=33
   ashift=9
   asize=1500336095232
   children[3]
   type='disk'
   id=3
   guid=18356839793156279878
   path='/dev/dsk/c3t7d0s0'

devid='id1,[EMAIL PROTECTED]/a'
   whole_disk=1
   metaslab_array=4407
   metaslab_shift=33
   ashift=9
   asize=1500336095232
Uberblock

   magic = 00bab10c
   version = 3
   txg = 2834960
   guid_sum = 12336413438187464178
   timestamp = 1171223485 UTC = Sun Feb 11 11:51:25 2007
   rootbp = [L0 DMU objset] 400L/200P DVA[0]=2:3aa12a3600:200
DVA[1]=3:378957f000:200 DVA[2]=0:7d2312f200:200 fletcher4 lzjb LE
contiguous birth=2834960 fill=3672
cksum=f65361601:5b3233d8018:117d616a33b47:24feff94a90701

Dataset mos [META], ID 0, cr_txg 4, 294M, 3672 objects, rootbp [L0 DMU
objset] 400L/200P DVA[0]=2:3aa12a3600:200 DVA[1]=3:378957f000:200
DVA[2]=0:7d2312f200:200 fletcher4 lzjb LE contiguous birth=2834960
fill=3672 cksum=f65361601:5b3233d8018:117d616a33b47:24feff94a90701

   Object  lvl   iblk   dblk  lsize  asize  type
zdb: dmu_bonus_hold(2220930) failed, errno 2



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs corruption -- odd inum?

2007-02-10 Thread Joe Little

So, I attempting to find the inode from the result of a zpool status -v:

errors: The following persistent errors have been detected:

 DATASET  OBJECT  RANGE
 cc   21e382  lvl=0 blkid=0


Well, 21e282 appears to not be a valid number for find . -inum blah

Any suggestions?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] 118855-36 ZFS

2007-02-05 Thread Joe Little

On 2/5/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Casper,

Monday, February 5, 2007, 2:32:49 PM, you wrote:

Hello zfs-discuss,

  I've patched U2 system to 118855-36. Several zfs related bugs id
  should be covered between -19 and -36 like HotSpare support.

  However despite -36 is installed 'zpool upgrade' still claims only
  v1 and v2 support. Alse there's no zfs promote, etc.

  /kernel/drv/zfs is dated May 18 with 482448 in size which looks too
  old.

  Also 118855-36 has many zfs related bugs listed however in a section
  file I do not see zfs,zpool commands or zfs kernel modules.
  Looks like they are not delivered.


CDSC Have you also installed the companion patch 124205-04?  It contains all
CDSC the ZFS bits.

I've just figured it out.

However why those bug ids related in ZFS are listed in -36 while
actually those fixes are delivered in 124205-05 (the same bug ids)?



Ah.. it looks like this patch is non-public (need a service plan). So
the free as in beer version ZFS U3 bits likely won't make it until U4
into the general release.


Also why 'smpatch analyze' doesn't show 124205? (I can force it to
download the patch if I specify it).


--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: What SATA controllers are people using for ZFS?

2007-02-01 Thread Joe Little

On 2/1/07, Al Hopper [EMAIL PROTECTED] wrote:

On Thu, 1 Feb 2007, Tom Buskey wrote:

 [i]
 I got an Addonics eSata card. Sata 3.0. PCI *or* PCI-X. Works right off the 
bat w/ 10u3. No firmware update needed. It was $130. But I don't pull out my hair 
and I can use it if I upgrade my server for pci-x
 [/i]

 And I'm finding the throughput isn't there.   2MB/s in ZFS RAIDZ and worse 
with UFS.
 *sigh*

I think that there are big issues with the 3124 driver.  I saw unexplained
pauses that lasted from 30 to 80+ Seconds during a tar from a single SATA
disk drive that I was migrating data from (using a Syba SD-SATA2-2E2I
card).  I fully expected the kernel to crash while observing this transfer
(it did'nt).  It happened periodically - each time a certain amount of
data had been transferred (just by observation - not measurement).  And
this was a UFS filesystem and the drive is a Sun original drive from an
Ultra 20 box.  I need to do some followup experiments as Mike Riley
(Sun) has kindly offered to take my results to the people working on this
driver.

 So, anyone know an inexpensive 4 port SATA card for PCI that'll work
 with 10u3 and I don't need to reflash the BIOS on?  (I bricked a
 Syba...)

Honestly, you're much better off with the $125 8-port SuperMicro board
that I have been unable to break to date. Details: SuperMicro
AOC-SAT2-MV8 8-port - uses the Rev C0 (Hercules-2) chip:

http://www.supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm

Kudos to the Sun developers working the Marvell driver!  :)  In the
meantime I hope to find time to test a SAS2041E-R (initially the PCI
Express version of this card).



We switched away from those same Marvell cards because of unexplained
disconnects/reconnects that ZFS/Solaris would not survive from.
Stability for us came from embracing the Sil3124-2's (Tekram). We had
two marvell based systems, and the most stable are the now
discontinued SATA-I adaptec 16 port cards, and Sil3124s. I think its
redundant, but the state of SATA support here is still the most
glaring weakness. Isolating this all to a SCSI-to-SATA external
chassis is the surest route to bliss.


Keep posting to zfs-discuss!  :)

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Joe Little

On 1/24/07, Jonathan Edwards [EMAIL PROTECTED] wrote:


On Jan 24, 2007, at 09:25, Peter Eriksson wrote:

 too much of our future roadmap, suffice it to say that one should
 expect
 much, much more from Sun in this vein: innovative software and
 innovative
 hardware working together to deliver world-beating systems with
 undeniable
 economics.

 Yes please. Now give me a fairly cheap (but still quality) FC-
 attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-)

Could you outline why FC attached instead of network attached (iSCSI
say) makes more sense to you?  It might help to illustrate the demand
for an FC target I'm hearing instead of just a network target ..



I'm not generally for FC-attached storage, but we've documented here
many times how the round trip latency with iSCSI hasn't been the
perfect match with ZFS and NFS (think NAS). You need either IB or FC
right now to make that workable. Some day though.. either with
nvram-backed NFS or cheap 10Gig-E...




.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What SATA controllers are people using for ZFS?

2006-12-21 Thread Joe Little

and specific models, and the driver used? Looks like there may be
stability issues with the marvell, which appear to go unanswered..


On 12/21/06, Jason J. W. Williams [EMAIL PROTECTED] wrote:

Hi Naveen,

I believe the newer LSI cards work pretty well with Solaris.

Best Regards,
Jason

On 12/20/06, Naveen Nalam [EMAIL PROTECTED] wrote:
 Hi,

 This may not be the right place to post, but hoping someone here is running a 
reliably working system with 12 drives using ZFS that can tell me what hardware 
they are using.

 I have on order with my server vendor a pair of 12-drive servers that I want 
to use with ZFS for our company file stores. We're trying to use Supermicro PDSME 
motherboards, and each has two Supermicro MV8 sata cards. Solaris 10U3 he's found 
doesn't work on these systems. And I just read a post today (and an older post) on 
this group about how the Marvell based cards lock up. I can't afford lockups since 
this is very critical and expensive data that is being stored.

 My goal is a single cpu board that works with Solaris, and somehow get 
12-drives plus 2 system boot drives plugged into it. I don't see any suitable sata 
cards on the Sun HCL.

 Are there any 4-port PCIe cards that people know reliably work? The Adaptec 
1430SA looks nice, but no idea if it works. I could potentially get two 4-port 
PCIe cards, a 2 port PCI sata card (for boot), and 4-port motherboard - for 14 
drives total. And cough up the extra cash for a supported dual-cpu motherboard 
(though i'm only using one cpu).

 any advice greatly appreciated..

 Thanks!
 Naveen


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What SATA controllers are people using for ZFS?

2006-12-21 Thread Joe Little

On 12/21/06, Al Hopper [EMAIL PROTECTED] wrote:

On Thu, 21 Dec 2006, Joe Little wrote:

 and specific models, and the driver used? Looks like there may be
 stability issues with the marvell, which appear to go unanswered..

I've tested a box running two Marvell based 8-port controllers (which has
been running great on Update 2) on the solaris Update 3 beta without
issues.  The specific card is the newer version of the SuperMicro board:

http://www.supermicro.com/products/accessories/addon/AoC-SAT2

but have yet to test them under the released Update 3 code.  I'll post a
followup after the box is upgraded or re-installed.  [I'm waiting for the
next 48-hour day so that I can do the upgrade without affecting the user
community!!]

AFAIR the reported Marvell issues were with ON B54 - not Update 3.  Or do
I have this wrong?


Yes, this is all OpenSolaris based, so the areca seems to be for
Solaris 10 proper and the marvell may have issues at least at B54.



In any case, if you discover a bug with the Sun proprietary Marvell driver
and Update 3 and you have a support contract, you can log a service
request and get it fixed.  Since the Marvell chipset is used in Thumper,
I think its a pretty safe bet that the Marvell driver will continue to
work very nicely (Thanks Lori).

And yes, I would feel better if this driver was open sourced but that
is Suns' decision to make.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
 OpenSolaris Governing Board (OGB) Member - Feb 2006


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] B54 and marvell cards

2006-12-20 Thread Joe Little

We just put together a new system for ZFS use at a company, and twice
in one week we've had the system wedge. You can log on, but the zpools
are hosed, and a reboot never occurs if requested since it can't
unmount the zfs volumes. So, only a power cycle works.

In both cases, we get this:

Dec 20 10:59:36 kona marvell88sx: [ID 331397 kern.warning] WARNING:
marvell88sx0: device on port 2 still busy
Dec 20 10:59:36 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:36 kona  port 2: device reset
Dec 20 10:59:37 kona marvell88sx: [ID 331397 kern.warning] WARNING:
marvell88sx0: device on port 2 still busy
Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:37 kona  port 2: device reset
Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:37 kona  port 2: link lost
Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:37 kona  port 2: link established
Dec 20 10:59:37 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 2:
Dec 20 10:59:37 kona marvell88sx: [ID 517869 kern.info] device
disconnected
Dec 20 10:59:37 kona marvell88sx: [ID 517869 kern.info] device connected

The first time was on port 1 (Sunday) and now this has occurred on
port 2. Is there a known unrecoverable condition with the marvell
card. We adopted this card because the adaptec 16 port card was
discontinued. Everyday there seems to be less in the way of workable
SATA cards for Solaris (sigh). Here's the output on startup, which
always occurs:

Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 0:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 1:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 2:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 3:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 4:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 5:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
Recovered 

[zfs-discuss] Re: B54 and marvell cards

2006-12-20 Thread Joe Little

On 12/20/06, Joe Little [EMAIL PROTECTED] wrote:

We just put together a new system for ZFS use at a company, and twice
in one week we've had the system wedge. You can log on, but the zpools
are hosed, and a reboot never occurs if requested since it can't
unmount the zfs volumes. So, only a power cycle works.

In both cases, we get this:


Note to group.. Is the tekram 834A (SATA-II card w/ sil3124-1 and
sil3124-2) supported yet?

Seems like marvell is not the way to go..




Dec 20 10:59:36 kona marvell88sx: [ID 331397 kern.warning] WARNING:
marvell88sx0: device on port 2 still busy
Dec 20 10:59:36 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:36 kona  port 2: device reset
Dec 20 10:59:37 kona marvell88sx: [ID 331397 kern.warning] WARNING:
marvell88sx0: device on port 2 still busy
Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:37 kona  port 2: device reset
Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:37 kona  port 2: link lost
Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Dec 20 10:59:37 kona  port 2: link established
Dec 20 10:59:37 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 2:
Dec 20 10:59:37 kona marvell88sx: [ID 517869 kern.info] device
disconnected
Dec 20 10:59:37 kona marvell88sx: [ID 517869 kern.info] device connected

The first time was on port 1 (Sunday) and now this has occurred on
port 2. Is there a known unrecoverable condition with the marvell
card. We adopted this card because the adaptec 16 port card was
discontinued. Everyday there seems to be less in the way of workable
SATA cards for Solaris (sigh). Here's the output on startup, which
always occurs:

Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 0:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 1:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 2:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 3:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 4:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError interrupt
Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Recovered communication error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 PHY ready change
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 10-bit to 8-bit decode error
Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
 Disparity error
Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
marvell88sx0: error on port 5:
Dec 17 11:23

[zfs-discuss] Re: B54 and marvell cards

2006-12-20 Thread Joe Little

Some further joy:

http://bugs.opensolaris.org/view_bug.do?bug_id=6504404

On 12/20/06, Joe Little [EMAIL PROTECTED] wrote:

On 12/20/06, Joe Little [EMAIL PROTECTED] wrote:
 We just put together a new system for ZFS use at a company, and twice
 in one week we've had the system wedge. You can log on, but the zpools
 are hosed, and a reboot never occurs if requested since it can't
 unmount the zfs volumes. So, only a power cycle works.

 In both cases, we get this:

Note to group.. Is the tekram 834A (SATA-II card w/ sil3124-1 and
sil3124-2) supported yet?

Seems like marvell is not the way to go..



 Dec 20 10:59:36 kona marvell88sx: [ID 331397 kern.warning] WARNING:
 marvell88sx0: device on port 2 still busy
 Dec 20 10:59:36 kona sata: [ID 801593 kern.notice] NOTICE:
 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Dec 20 10:59:36 kona  port 2: device reset
 Dec 20 10:59:37 kona marvell88sx: [ID 331397 kern.warning] WARNING:
 marvell88sx0: device on port 2 still busy
 Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Dec 20 10:59:37 kona  port 2: device reset
 Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Dec 20 10:59:37 kona  port 2: link lost
 Dec 20 10:59:37 kona sata: [ID 801593 kern.notice] NOTICE:
 /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
 Dec 20 10:59:37 kona  port 2: link established
 Dec 20 10:59:37 kona marvell88sx: [ID 812950 kern.warning] WARNING:
 marvell88sx0: error on port 2:
 Dec 20 10:59:37 kona marvell88sx: [ID 517869 kern.info] device
 disconnected
 Dec 20 10:59:37 kona marvell88sx: [ID 517869 kern.info] device 
connected

 The first time was on port 1 (Sunday) and now this has occurred on
 port 2. Is there a known unrecoverable condition with the marvell
 card. We adopted this card because the adaptec 16 port card was
 discontinued. Everyday there seems to be less in the way of workable
 SATA cards for Solaris (sigh). Here's the output on startup, which
 always occurs:

 Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
 marvell88sx0: error on port 0:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError 
interrupt
 Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Recovered communication error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  PHY ready change
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  10-bit to 8-bit decode error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Disparity error
 Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
 marvell88sx0: error on port 1:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError 
interrupt
 Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Recovered communication error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  PHY ready change
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  10-bit to 8-bit decode error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Disparity error
 Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
 marvell88sx0: error on port 2:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError 
interrupt
 Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Recovered communication error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  PHY ready change
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  10-bit to 8-bit decode error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Disparity error
 Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
 marvell88sx0: error on port 3:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError 
interrupt
 Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Recovered communication error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  PHY ready change
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  10-bit to 8-bit decode error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Disparity error
 Dec 17 11:23:15 kona marvell88sx: [ID 812950 kern.warning] WARNING:
 marvell88sx0: error on port 4:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info] SError 
interrupt
 Dec 17 11:23:15 kona marvell88sx: [ID 131198 kern.info] SErrors:
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  Recovered communication error
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info]
  PHY ready change
 Dec 17 11:23:15 kona marvell88sx: [ID 517869 kern.info

Re: [zfs-discuss] poor NFS/ZFS performance

2006-11-22 Thread Joe Little

On 11/22/06, Chad Leigh -- Shire.Net LLC [EMAIL PROTECTED] wrote:


On Nov 22, 2006, at 4:11 PM, Al Hopper wrote:

 No problem there!  ZFS rocks.  NFS/ZFS is a bad combination.

Has anyone tried sharing a ZFS fs using samba or afs or something
else besides nfs?  Do we have the same issues?



I've done some CIFS tests in the past, and off the top of my head, it
was about 3-5x faster than NFS.



Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best version of Solaris 10 fro ZFS ?

2006-10-27 Thread Joe Little

The latest OpenSolaris release? Perhaps Nexenta in the end is the way
to best deliver/maintain that.


On 10/27/06, David Blacklock [EMAIL PROTECTED] wrote:

What is the current recommended version of Solaris 10 for ZFS ?
-thanks,
-Dave

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] marvel cards.. as recommended

2006-09-13 Thread Joe Little

On 9/12/06, James C. McPherson [EMAIL PROTECTED] wrote:

Joe Little wrote:
 So, people here recommended the Marvell cards, and one even provided a
 link to acquire them for SATA jbod support. Well, this is what the
 latest bits (B47) say:

 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx0: Could not attach, unsupported chip stepping or unable to
 get the chip stepping
 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx1: Could not attach, unsupported chip stepping or unable to
 get the chip stepping
 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx0: Could not attach, unsupported chip stepping or unable to
 get the chip stepping
 Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
 marvell88sx1: Could not attach, unsupported chip stepping or unable to
 get the chip stepping

 Any takers on how to get around this one?

You could start by providing the output from prtpicl -v and
prtconf -v as well as /usr/X11/bin/scanpci -v -V 1 so we
know which device you're actually having a problem with.

Is the pci vendor+deviceid for that card listed in your
/etc/driver_aliases file against the marvell88sx driver?


James



I don't know if you really want all those large files, but
/etc/driver_aliases lists:

marvell88sx pci11ab,6081.9

[EMAIL PROTECTED]:~# lspci | grep Marv
03:01.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX6081 8-port SATA II PCI-X Controller (rev 07)
05:01.0 SCSI storage controller: Marvell Technology Group Ltd.
MV88SX6081 8-port SATA II PCI-X Controller (rev 07)

[EMAIL PROTECTED]:~# lspci -n | grep 11ab
03:01.0 0100: 11ab:6081 (rev 07)
05:01.0 0100: 11ab:6081 (rev 07)

And it sees the module:
198 f571   9f10  62   1  marvell88sx (marvell88sx HBA Driver v1.8)

Is this a support revision of the card? Is there something stupid like
enabling the jumpers or some such that's required?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: marvel cards.. as recommended

2006-09-13 Thread Joe Little

Yeah. I got the message from a few others, and we are hoping to
return/buy the newer one. I've sort of surprised by the limited set of
SATA RAID or JBOD cards that one can actually use. Even the one's
linked to on this list sometimes aren't supported :). I need to get up
and running like yesterday, so we are just ordering the cards post
haste.


On 9/13/06, Anton B. Rang [EMAIL PROTECTED] wrote:

A quick peek at the Linux source shows a small workaround in place for the 07 
revision...maybe if you file a bug against Solaris to support this revision it 
might be possible to get it added, at least if that's the only issue.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] marvel cards.. as recommended

2006-09-12 Thread Joe Little

So, people here recommended the Marvell cards, and one even provided a
link to acquire them for SATA jbod support. Well, this is what the
latest bits (B47) say:

Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
marvell88sx0: Could not attach, unsupported chip stepping or unable to
get the chip stepping
Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
marvell88sx1: Could not attach, unsupported chip stepping or unable to
get the chip stepping
Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
marvell88sx0: Could not attach, unsupported chip stepping or unable to
get the chip stepping
Sep 12 13:51:54 vram marvell88sx: [ID 679681 kern.warning] WARNING:
marvell88sx1: Could not attach, unsupported chip stepping or unable to
get the chip stepping

Any takers on how to get around this one?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unaccounted for daily growth in ZFS disk space usage

2006-08-24 Thread Joe Little

We finally flipped the switch on one of our ZFS-based servers, with
approximately 1TB of 2.8TB (3 stripes of 950MB or so, each of which is
a RAID5 volume on the adaptec card). We have snapshots every 4 hours
for the first few days. If you add up the snapshot references it
appears somewhat high versus daily use (mostly mail boxes, spam, etc
changing), but say an aggregate of no more than 400+MB a day.

However, zfs list shows our daily pool as a whole, and per day we are
growing by .01TB, or more specifically 80GB a day. That's a far cry
different than the 400MB we can account for. Is it possible that
metadata/ditto blocks, or the like is trully growing that rapidly. By
our calculations, we will triple our disk space (sitting still) in 6
months and use up the remaining 1.7TB. Of course, this is only with
2-3 days of churn, but its an alarming rate where before on the NetApp
we didn't see anything close to this rate.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] unaccounted for daily growth in ZFS disk space usage

2006-08-24 Thread Joe Little

On 8/24/06, Matthew Ahrens [EMAIL PROTECTED] wrote:

On Thu, Aug 24, 2006 at 07:07:45AM -0700, Joe Little wrote:
 We finally flipped the switch on one of our ZFS-based servers, with
 approximately 1TB of 2.8TB (3 stripes of 950MB or so, each of which is
 a RAID5 volume on the adaptec card). We have snapshots every 4 hours
 for the first few days. If you add up the snapshot references it
 appears somewhat high versus daily use (mostly mail boxes, spam, etc
 changing), but say an aggregate of no more than 400+MB a day.

 However, zfs list shows our daily pool as a whole, and per day we are
 growing by .01TB, or more specifically 80GB a day. That's a far cry
 different than the 400MB we can account for. Is it possible that
 metadata/ditto blocks, or the like is trully growing that rapidly. By
 our calculations, we will triple our disk space (sitting still) in 6
 months and use up the remaining 1.7TB. Of course, this is only with
 2-3 days of churn, but its an alarming rate where before on the NetApp
 we didn't see anything close to this rate.

How are you calculating this 400MB/day figure?  Keep in mind that space
used by each snapshot is the amount of space unique to that snapshot.
Adding up the space used by all your snapshots is *not* the amount of
space that they are all taking up cumulatively.  For leaf filesystems
(those with no descendents), you can calculate the space used by
all snapshots as (fs's used - fs's referenced).

How many filesystems do you have?  Can you send me the output of 'zfs
list' and 'zfs get -r all pool'?

How much space did you expect to be using, and what data is that based
on?  Are you sure you aren't writing 80GB/day to your pool?

--matt



well, by deleting my 4-hourlies I reclaimed most of the data. To
answer some of the questions, its about 15 filesystems (decendents
included). I'm aware of the space used by snapshots overlapping. I was
looking at the total space (zpool iostat reports) and seeing the diff
per day. The 400MB/day was be inspection and by looking at our nominal
growth on a netapp.

It would appear that if one days many snapshots, there is an initial
quick growth in disk usage, but once those snapshot meet their
retention level (say 12), the growth would appear to match our typical
400MB/day. Time will prove this one way or other. By simply getting
rid of hourly snapshots and collapsing to dailies for two days worth,
I reverted to only ~1-2GB total growth, which is much more in line
with expectations.

For various reasons, I can't post the zfs list type results as yet.
I'll need to get the ok for that first.. Sorry..
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] multi-layer ZFS filesystems and exporting: my stupid question for the day

2006-08-16 Thread Joe Little

On 8/16/06, Frank Cusack [EMAIL PROTECTED] wrote:

On August 16, 2006 10:25:18 AM -0700 Joe Little [EMAIL PROTECTED] wrote:
 Is there a way to allow simple export commands the traverse multiple
 ZFS filesystems for exporting? I'd hate to have to have hundreds of
 mounts required for every point in a given tree (we have users,
 projects, src, etc)

Set the sharenfs property on the filesystems and use the automounter
on the client.



Damn. We are hoping to move away from automounters and maintenance of
such (we use NeoPath, for example, for virtual aggregation of the
paths). In the NAS world, sometimes automounts are not available. So,
if this is true, you'll likely need to count me as one of those people
who says we can't make a filesystem per user, and give us user quotas
now please :)



-frank




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: Re: [zfs-discuss] multi-layer ZFS filesystems and exporting: my stupid question for the day

2006-08-16 Thread Joe Little

On 8/16/06, Frank Cusack [EMAIL PROTECTED] wrote:

On August 16, 2006 10:34:31 AM -0700 Joe Little [EMAIL PROTECTED] wrote:
 On 8/16/06, Frank Cusack [EMAIL PROTECTED] wrote:
 On August 16, 2006 10:25:18 AM -0700 Joe Little [EMAIL PROTECTED] wrote:
  Is there a way to allow simple export commands the traverse multiple
  ZFS filesystems for exporting? I'd hate to have to have hundreds of
  mounts required for every point in a given tree (we have users,
  projects, src, etc)

 Set the sharenfs property on the filesystems and use the automounter
 on the client.


 Damn. We are hoping to move away from automounters and maintenance of
 such (we use NeoPath, for example, for virtual aggregation of the
 paths).

I don't know what NeoPath is but automounts are trivial or at least
easy to maintain even for very large sites.  You are using path wildcards
and DNS aliases, yes?


used to.. running away quickly. For different *nix flavors, pick one
of autofs, automountd, etc.. Nohide support, no nohide support,
netgroups, LDAP, it all gets ugly pretty quickly. We want to move to
statically definied mount trees similar to AFS managed centrally for
all using NFS as the common protocol (aka the NeoPath)



 In the NAS world, sometimes automounts are not available. So,
 if this is true, you'll likely need to count me as one of those people
 who says we can't make a filesystem per user, and give us user quotas
 now please :)

I don't understand.  If an automount is not available, how is that
different than the nfs server itself not being available.  Or do you
mean some clients do not have an automounter.


Some clients don't have automounters available. Also some servers:
Example is gateways/proxys (think SMB proxy w/o automounter support).

Other clients in heavy use require lots of changes to get what you
propose, such as OSX's automounter (storing the results in NetInfo)



-frank


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] ZFS vs. Apple XRaid

2006-08-01 Thread Joe Little

I've submitted these to Roch and co before on the NFS list and off
list. My favorite case was writing 6250 8k files (randomly generated)
over NFS from a solaris or linux client. We originally were getting
20K/sec when I was using RAIDZ, but between switching to RAID-5 backed
iscsi luns in a zpool stripe and B40/41, we saw our performance
approach a more reasonable 300-400K/sec average. I get closer to
1-3MB/sec with UFS as the backend vs ZFS. Of course, if its locally
attached storage (not iSCSI) performance starts to be parallel to that
of UFS or better. There is some built in latency and some major
penalties for streaming writes of various sizes with the NFS
implementation and its fsync happiness (3 fsyncs per write from an NFS
client). Its all very true that its stable/safe, but its also very
slow in various use cases!


On 8/1/06, eric kustarz [EMAIL PROTECTED] wrote:

Joe Little wrote:

 On 7/31/06, Dale Ghent [EMAIL PROTECTED] wrote:

 On Jul 31, 2006, at 8:07 PM, eric kustarz wrote:

 
  The 2.6.x Linux client is much nicer... one thing fixed was the
  client doing too many commits (which translates to fsyncs on the
  server).  I would still recommend the Solaris client but i'm sure
  that's no surprise.  But if you'r'e stuck on Linux, upgrade to the
  latest stable 2.6.x and i'd be curious if it was better.

 I'd love to be on kernel 2.6 but due to the philosophical stance
 towards OpenAFS of some people on the lkml list[1], moving to 2.6 is
 a tough call for us to do. But that's another story for another list.
 The fact is that I'm stuck on 2.4 for the time being and I'm having
 problems with a Solaris/ZFS NFS server that I'm (and Jan) are not
 having with Solaris/UFS and (in my case) Linux/XFS NFS server.

 [1] https://lists.openafs.org/pipermail/openafs-devel/2006-July/
 014041.html

 /dale


 First, OpenAFS 1.4 works just fine with 2.6 based kernels. We've
 already standardized on that over 2.4 kernels (deprecated) at
 Stanford. Second, I had similar fsync fatality when it came to NFS
 clients (linux or solaris mind you) and non-local backed clients using
 ZFS on a Solaris 10U2 (or B40+) server. My case was iscsi and it was
 chalked up to low latency on iSCSI, but I still to this day find NFS
 write performance on small or multititudes of files at a time with ZFS
 as a back end to be rather iffy. Its perfectly fast for NFS reads and
 and its always speedly local to the box, but the NFS/ZFS integration
 seems problematic. I can always test w/ UFS and get great performance.
 Its the roundtrips with many fsyncs to the backend storage that ZFS
 requires for commits that get ya.


Do you have a reproducable test case for this?  If so, i would be
interested...

I wonder if you're hitting:
6413510 zfs: writing to ZFS filesystem slows down fsync() on other files
in the same FS
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6413510

which Neil is finishing up as we type.

The problem basically is that fsyncs can get slowed down by non-related
I/O, so if you had a process/NFS client that was doing lots of I/O and
another doing fsyncs, the fsyncs would get slowed down by the other
process/client.

eric




 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] ZFS vs. Apple XRaid

2006-07-31 Thread Joe Little

On 7/31/06, Dale Ghent [EMAIL PROTECTED] wrote:

On Jul 31, 2006, at 8:07 PM, eric kustarz wrote:


 The 2.6.x Linux client is much nicer... one thing fixed was the
 client doing too many commits (which translates to fsyncs on the
 server).  I would still recommend the Solaris client but i'm sure
 that's no surprise.  But if you'r'e stuck on Linux, upgrade to the
 latest stable 2.6.x and i'd be curious if it was better.

I'd love to be on kernel 2.6 but due to the philosophical stance
towards OpenAFS of some people on the lkml list[1], moving to 2.6 is
a tough call for us to do. But that's another story for another list.
The fact is that I'm stuck on 2.4 for the time being and I'm having
problems with a Solaris/ZFS NFS server that I'm (and Jan) are not
having with Solaris/UFS and (in my case) Linux/XFS NFS server.

[1] https://lists.openafs.org/pipermail/openafs-devel/2006-July/
014041.html

/dale



First, OpenAFS 1.4 works just fine with 2.6 based kernels. We've
already standardized on that over 2.4 kernels (deprecated) at
Stanford. Second, I had similar fsync fatality when it came to NFS
clients (linux or solaris mind you) and non-local backed clients using
ZFS on a Solaris 10U2 (or B40+) server. My case was iscsi and it was
chalked up to low latency on iSCSI, but I still to this day find NFS
write performance on small or multititudes of files at a time with ZFS
as a back end to be rather iffy. Its perfectly fast for NFS reads and
and its always speedly local to the box, but the NFS/ZFS integration
seems problematic. I can always test w/ UFS and get great performance.
Its the roundtrips with many fsyncs to the backend storage that ZFS
requires for commits that get ya.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The ZFS Read / Write roundabout

2006-07-01 Thread Joe Little

I've always seen this curve in my tests (local disk or iscsi) and just
think its zfs as designed. I haven't seen much parallelism when I have
multiple i/o jobs going, the filesystem seems to go mostly into one or
the other mode. Perhaps per vdev (in iscsi I'm only exposing one or
two), there is only one performance characterist at a time, write or
read.


On 6/30/06, Nathan Kroenert [EMAIL PROTECTED] wrote:

Hey all -

Was playing a little with zfs today and noticed that when I was
untarring a 2.5gb archive both from and onto the same spindle in my
laptop, I noticed that the bytes red and written over time was seesawing
between approximately 23MB/s and 0MB/s.

It seemed like we read and read and read till we were all full up, then
wrote until we were empty, and so the cycle went.

Now: as it happens, 31MB/s is about as fast as it gets on this disk at
that part of the platter (using dd and large block size on the rdev).
(iirc, it actually started out closer to 30MB, so the slower speed might
be a red herring...)
So, it seems to be below what I would hope to get out of the platter,
but it's not too bad.
Whether I:
read at 23, write at 0 then read at 0, write at 23
or
read at 15 and write at 15
it work out the same(ish)...

The question is: Is this deliberate? (I'm guessing it's the txg flushing
that's causing this behaviour)

iostat output is at the end of this email...

Is this a deliberate attempt to reduce the number of seeks and IO's to
the disk, (and especially competing read/writes on PATA)?

I guess in the back of my mind is: Is this the fastest / best way we can
approach this?

Also - When dding the raw slice that zfs is using, I noticed that my IO
rate also seesawed up and down between 31MB/s and 28MB/s, over a 5
second interval... I was not expecting that... Thoughts?

Thanks! :)

Nathan.

Here is the iostat example -

  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk00.0  201.50.0 23908.7 33.0  2.0  173.5 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk00.0  200.00.0 24822.5 33.0  2.0  174.9 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk00.0  184.00.0 22413.1 33.0  2.0  190.2 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0   42.0  247.9 5246.9 8753.2 20.1  1.6   74.9  66  95
nfs2 0.00.00.00.0  0.0  0.00.0   0  0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0  159.06.0 20290.84.0 13.4  1.9   92.7  90 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0  186.00.0 23809.80.0 31.2  2.0  178.5 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0  172.0   30.0 22017.2 3016.2 31.5  2.0  166.0 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk00.0  176.00.0 21109.0 33.0  2.0  198.8 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk00.0  189.00.0 23422.8 33.0  2.0  185.1 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk00.0  182.00.0 23288.6 33.0  2.0  192.3 100 100
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0   33.0  364.0 3904.0 7765.6 19.8  1.6   53.9  70  92
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0  146.06.0 18563.94.0 18.2  1.4  129.1  69  74
nfs2 0.00.00.00.0  0.0  0.00.0   0   0
  extended device statistics
device   r/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0  131.00.0 16768.90.0 18.0  1.8  150.8  67  90
nfs2 0.00.00.00.0  0.0  0.00.0   0   0



--

___
zfs-discuss mailing list

Re: Re: [zfs-discuss] Re: ZFS and Storage

2006-06-27 Thread Joe Little

On 6/27/06, Erik Trimble [EMAIL PROTECTED] wrote:

Darren J Moffat wrote:

 Peter Rival wrote:

 storage arrays with the same arguments over and over without
 providing an answer to the customer problem doesn't do anyone any
 good.  So.  I'll restate the question.  I have a 10TB database that's
 spread over 20 storage arrays that I'd like to migrate to ZFS.  How
 should I configure the storage array?  Let's at least get that
 conversation moving...


 I'll answer your question with more questions:

 What do you do just now, ufs, ufs+svm, vxfs+vxvm, ufs+vxvm, other ?

 What of that doesn't work for you ?

 What functionality of ZFS is it that you want to leverage ?

It seems that the big thing we all want (relative to the discussion of
moving HW RAID to ZFS) from ZFS is the block checksumming (i.e. how to
reliabily detect that a given block is bad, and have ZFS compensate).
Now, how do we get things when using HW arrays, and not just treat them
like JBODs (which is impractical for large SAN and similar arrays that
are already configured).

Since the best way to get this is to use a Mirror or RAIDZ vdev, I'm
assuming that the proper way to get benefits from both ZFS and HW RAID
is the following:

(1)  ZFS mirror of  HW stripes, i.e.  zpool create tank mirror
hwStripe1 hwStripe2
(2)  ZFS RAIDZ of HW mirrors, i.e. zpool create tank raidz hwMirror1,
hwMirror2
(3)  ZFS RAIDZ of  HW stripes, i.e. zpool create tank raidz hwStripe1,
hwStripe2

mirrors of mirrors and raidz of raid5 is also possible, but I'm pretty
sure they're considerably less useful than the 3 above.

Personally, I can't think of a good reason to use ZFS with HW RAID5;
case (3) above seems to me to provide better performance with roughly
the same amount of redundancy (not quite true, but close).

I'd vote for (1) if you need high performance, at the cost of disk
space, (2) for maximum redundancy, and (3) as maximum space with
reasonable performance.


I'm making a couple of assumptions here:

(a)  you have the spare cycles on your hosts to allow for using ZFS
RAIDZ, which is a non-trivial cost (though not that big, folks).
(b)  your HW RAID controller uses NVRAM (or battery-backed cache), which
you'd like to be able to use to speed up writes
(c)  you HW RAID's NVRAM speeds up ALL writes, regardless of the
configuration of arrays in the HW
(d)  having your HW controller present individual disks to the machines
is a royal pain (way too many, the HW does other nice things with
arrays, etc)




The case for HW RAID 5 with ZFS is easy: when you use iscsi. You get
major performance degradation over iscsi when trying to coordinate
writes and reads serially over iscsi using RAIDZ. The sweet spot in
the iscsi world is let your targets do RAID5 or whatnot (RAID10,
RAID50, RAID6), and combine those into ZFS pools, mirrored or not.
There are other benefits to ZFS, including snapshots, easily managed
storage pools, and with iscsi, ease of switching head nodes with a
simple export/import.




Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-24 Thread Joe Little

To clarify what has just been stated. With zil disabled I got 4MB/sec.
With zil enabled I get 1.25MB/sec

On 6/23/06, Tao Chen [EMAIL PROTECTED] wrote:



On 6/23/06, Roch [EMAIL PROTECTED] wrote:

   On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote:
On 6/22/06, Jeff Bonwick [EMAIL PROTECTED] wrote:
 a test against the same iscsi targets using linux and XFS and the
 NFS server implementation there gave me 1.25MB/sec writes. I was
about
 to throw in the towel and deem ZFS/NFS has unusable until B41
came
 along and at least gave me 1.25MB/sec.

That's still super slow -- is this over a 10Mb link or something?

Jeff

 I  think the performance is   in line with expectation  for,
 small  file,single  threaded, open/write/close
NFS
 workload (nfs must commit on close). Therefore I expect :

 (avg file size) / (I/O latency).

 Joe does this formula approach the 1.25 MB/s ?


Joe sent me another set of DTrace output (biorpt.sh.rec.gz),
running 105 seconds with zil_disable=1.

I generate a graph using Grace ( rec.gif ).

The interesting part for me:
1) How I/O response time (at bdev level) changes in a pattern.
2) Both iSCSI (sd2) and local (sd1) storage follow the same pattern and have
almost identicle latency on average.
3) The latency is very high, either on average or at peaks.

Although a low throughput is expected given large amount of small files, I
don't expect such high latency,
and of course 1.25MB/s is too low, even after turn on zil_disable, I see
4MB/s in this data set.
I/O size at bdev level are actually pretty decent: mostly (75%) 128KB.

Here's a summary:

# biorpt -i biorpt.sh.rec

Generating report from biorpt.sh.rec ...

   === Top 5 I/O types ===

   DEVICET  BLKs COUNT
    -    
   sd1   W   256  3122
  sd2   W   256  3118
   sd1   W 2   164
  sd2   W 2   151
   sd2   W 3   123



  === Top 5 worst I/O response time ===

   DEVICET  BLKs  OFFSETTIMESTAMP  TIME.ms
    -    --  ---  ---
   sd1   W   256   529562656   104.322170  3316.90
  sd1   W   256   529563424   104.322185  3281.97
  sd2   W   256   521152480   104.262081  3262.49
   sd2   W   256   521152736   104.262102  3258.56
  sd1   W   256   529562912   104.262091  3249.85



  === Top 5 Devices with largest number of I/Os ===

  DEVICE  READ AVG.ms MBWRITE AVG.ms MB  IOs SEEK
  ---  --- -- --  --- -- --  --- 
   sd17   2.70  0 4169 440.62409 4176   0%
   sd26   0.25  0 4131 444.79407 4137   0%
  cmdk0  5  21.50  0  138   0.82   0  143  11%


   === Top 5 Devices with largest amount of data transfer ===


  DEVICE  READ AVG.ms MBWRITE AVG.ms MB   Tol.MB MB/s
  ---  --- -- --  --- -- --  --- 
  sd17   2.70  0 4169 440.62409  4094
   sd26   0.25  0 4131 444.79407  4074
   cmdk0  5  21.50  0  138   0.82  000

 Tao


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-23 Thread Joe Little

On 6/23/06, Roch [EMAIL PROTECTED] wrote:


Joe Little writes:
  On 6/22/06, Bill Moore [EMAIL PROTECTED] wrote:
   Hey Joe.  We're working on some ZFS changes in this area, and if you
   could run an experiment for us, that would be great.  Just do this:
  
   echo 'zil_disable/W1' | mdb -kw
  
   We're working on some fixes to the ZIL so it won't be a bottleneck when
   fsyncs come around.  The above command will let us know what kind of
   improvement is on the table.  After our fixes you could get from 30-80%
   of that improvement, but this would be a good data point.  This change
   makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a
   txg every 5 seconds.  So at most, your disk will be 5 seconds out of
   date compared to what it should be.  It's a pretty small window, but it
   all depends on your appetite for such windows.  :)
  
   After running the above command, you'll need to unmount/mount the
   filesystem in order for the change to take effect.
  
   If you don't have time, no big deal.
  
  
   --Bill
  
  
   On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote:
On 6/22/06, Jeff Bonwick [EMAIL PROTECTED] wrote:
 a test against the same iscsi targets using linux and XFS and the
 NFS server implementation there gave me 1.25MB/sec writes. I was about
 to throw in the towel and deem ZFS/NFS has unusable until B41 came
 along and at least gave me 1.25MB/sec.

That's still super slow -- is this over a 10Mb link or something?

Jeff

I  think the performance is   in line with expectation  for,
small  file,single  threaded, open/write/close   NFS
workload (nfs must commit on close). Therefore I expect :

(avg file size) / (I/O latency).

Joe does this formula approach the 1.25 MB/s ?



To this day, I still don't know how to calculate the i/o latency.
Average file size is always expected to be close to kernel page size
for NASes -- 4-8k. Always tune for that.






   
Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the
iscsi target, and single gig-e link (nge) to the NFS clients, who are
gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference.
   
Again, the issue is the multiple fsyncs that NFS requires, and likely
the serialization of those iscsi requests. Apparently, there is a
basic latency in iscsi that one could improve upon with FC, but we are
definitely in the all ethernet/iscsi camp for multi-building storage
pool growth and don't have interest in a FC-based SAN.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
 
  Well, following Bill's advice and the previous note on disabling zil,
  I ran my test on a B38 opteron initiator and if you do a time on the
  copy from the client, 6250 8k files transfer at 6MB/sec now. If you
  watch the entire commit on the backend using zpool iostat 1 I see
  that it takes a few more seconds, and the actual rate there is
  4MB/sec. Beats my best of 1.25MB/sec, and this is not B41.
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Joe, you know this but for the benefit of  others, I have to
highlight that running  any NFS server  this way, may cause
silent data corruption from client's point of view.

Whenever a server keeps  data in RAM this  way and  does not
commit it to stable storage  upon request from clients, that
opens a time window for corruption. So  a client writes to a
page, then reads the same page, and if the server suffered a
crash in between, the data may not match.

So this is performance at the expense of data integrity.

-r


Yes.. ZFS in its normal mode has better data integrity. However, this
may be a more ideal tradeoff if you have specific read/write patterns.
In my case, I'm going to use ZFS initially for my tier2 storage, with
nightly write periods (needs to be short duration rsync from tier1)
and mostly read periods throughout the rest of the day. I'd love to
use ZFS as a tier1 service as well, but then you'd have to perform as
a NetApp does. Same tricks, same NVRAM or initial write to local
stable storage before writing to backend storage. 6MB/sec is closer to
expected behavior for first tier at the expense of reliability. I
don't know what the answer is for Sun to make ZFS 1st Tier quality
with their NFS implementation and its sync happiness.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit x86

2006-06-22 Thread Joe Little

What if your 32bit system is just a NAS -- ZFS and NFS, nothing else?
I think it would still be ideal to allow tweaking of things at runtime
to make 32-bit systems more ideal.


On 6/21/06, Mark Maybee [EMAIL PROTECTED] wrote:

Yup, your probably running up against the limitations of 32-bit kernel
addressability.  We are currently very conservative in this environment,
and so tend to end up with a small cache as a result.  It may be
possible to tweak things to get larger cache sizes, but you run the risk
of starving out other processes trying to get memory.

-Mark

Robert Milkowski wrote:
 Hello zfs-discuss,

   Simple test 'ptime find /zfs/filesystem /dev/null' with 2GB RAM.
   After second, third, etc. time still it reads a lot from disks while
   find is running (atime is off).

   on x64 (Opteron) it doesn't.

   I guess it's due to 512MB heap limit in kernel for its cache.
   ::memstat shows 469MB for kernel and 1524MB on freelist.


   Is there anything could be done? I guess not but perhaps



   ps. of course there're a lot of files like ~150K.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] ZFS on 32bit x86

2006-06-22 Thread Joe Little

On 6/22/06, Darren J Moffat [EMAIL PROTECTED] wrote:

Rich Teer wrote:
 On Thu, 22 Jun 2006, Joe Little wrote:

 Please don't top post.

 What if your 32bit system is just a NAS -- ZFS and NFS, nothing else?
 I think it would still be ideal to allow tweaking of things at runtime
 to make 32-bit systems more ideal.

 I respectfully disagree.  Even on x86, 64-bits are common, and the
 price difference between 64-bit and 32-bit capable systems is small.
 So apart from keeping old stuff working, I can think of little or no
 justifcation to not go with 64-bit systems these days, even for a small
 S10 plus ZFS NAS appliance.  That way you leave behind all the pain
 32-bits gives you.

Are VIA processor chips 64bit capable yet ?

--
Darren J Moffat



Well, current Xeon-LVs are 32 bit only, but besides the point, I'm in
education, where our storage boxes are purchased using grant money
that must be utilized for x number of years. The answer from Rich Teer
indicates that we should dump old infrastructure and buy new, or if
you are in our industry (I represent Stanford University Electrical
Engineering), take your money/infrastructure elsewhere as only new
customers need apply :( A lot of organizations have a lot of 32 bit
infrastructure with multiple RAID cards, drives, etc that they'd love
to migrate over to ZFS. I'm using it now for creating large pools of
2nd tier storage. And yes, that will mostly be pre-existing hardware.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-22 Thread Joe Little

On 6/22/06, Jeff Bonwick [EMAIL PROTECTED] wrote:

 a test against the same iscsi targets using linux and XFS and the
 NFS server implementation there gave me 1.25MB/sec writes. I was about
 to throw in the towel and deem ZFS/NFS has unusable until B41 came
 along and at least gave me 1.25MB/sec.

That's still super slow -- is this over a 10Mb link or something?

Jeff




Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the
iscsi target, and single gig-e link (nge) to the NFS clients, who are
gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference.

Again, the issue is the multiple fsyncs that NFS requires, and likely
the serialization of those iscsi requests. Apparently, there is a
basic latency in iscsi that one could improve upon with FC, but we are
definitely in the all ethernet/iscsi camp for multi-building storage
pool growth and don't have interest in a FC-based SAN.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit x86

2006-06-22 Thread Joe Little

I guess the only hope is to find pin-compatible Xeons that are 64bit
to replace what is a large chassis with 24 slots of disks that has
specific motherboard form-factor, etc. We have 6 of these things from
a government grant that must be used for the stated purpose. So, yes,
we can buy product, but we simply can't get rid of the old equipment
designed for this purpose. Again, government auditors for the research
will say pick the solution for the hardware has donated/purchased
with grant funds -- Welcome to the world of research. If Sun wants to
_give_ us lots of hardware to make ZFS shine, great. But as is usually
the case, I got to make do with what I have.

For the vast majority of the storage, I'm running them as iscsi
targets with a single sun ultra20 as the frontend, but as from other
messages in the list, the iscsi route along with NFS has its own
pitfalls, 32bit or 64bit :)



On 6/22/06, Erik Trimble [EMAIL PROTECTED] wrote:

AMD Geodes are 32-bit only. I haven't heard any mention that they will
_ever_ be 64-bit.  But, honestly, this and the Via chip aren't really
ever going to be targets for Solaris. That is,  they simply aren't (any
substantial) part of the audience we're trying to reach with Solaris x86.

Also, relatively few 32-bit x86 systems can take  4GB.  While many of
the late-model P4 (and all Xeons since the P3 Xeon) chips have the
capability, most of them were married to chipsets which can't take more
than 4GB. On the AMD side, I'm pretty sure only the Athlon MP-series was
enabled for PAE, and only a tiny amount of them were sold.

So, basically, the problem boils down to those with Xeons, a few
single-socket P4s, and some of this-year's Pentium Ds.  Granted, this
makes up most of the x86 server market. So, yes, it _would_ be nice to
be able to dump a tuning parameter into /etc/system to fix the cache
starvation (and other related 4GB RAM) problems.  However, I have to
say that working with PAE is messy, and, honestly, 64-bit enabled 1U/3U
servers are dirt cheap now.  So, while I empathize with the market that
has severe purchasing constraints, I think it's entirely reasonable to
be up front about needing a 64-bit processor for ZFS, _if_ we've
explored expanding the 32-bit environment, and discovered it was too
expensive (in resources required) to fix.

Dell (arrggh! Not THEM!) sells  PowerEdge servers with plenty of PCI
slots and RAM, and 64-bit CPUs for around $1000 now.  Hell, WE sell
dual-core x2100s for under $2k.   I'm sure one can pick up a whitebox
single-core Opteron for around $1k.   That's not unreasonable to ask to
get the latest technology.

-Erik



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-22 Thread Joe Little

On 6/22/06, Bill Moore [EMAIL PROTECTED] wrote:

Hey Joe.  We're working on some ZFS changes in this area, and if you
could run an experiment for us, that would be great.  Just do this:

echo 'zil_disable/W1' | mdb -kw

We're working on some fixes to the ZIL so it won't be a bottleneck when
fsyncs come around.  The above command will let us know what kind of
improvement is on the table.  After our fixes you could get from 30-80%
of that improvement, but this would be a good data point.  This change
makes ZFS ignore the iSCSI/NFS fsync requests, but we still push out a
txg every 5 seconds.  So at most, your disk will be 5 seconds out of
date compared to what it should be.  It's a pretty small window, but it
all depends on your appetite for such windows.  :)

After running the above command, you'll need to unmount/mount the
filesystem in order for the change to take effect.

If you don't have time, no big deal.


--Bill


On Thu, Jun 22, 2006 at 04:22:22PM -0700, Joe Little wrote:
 On 6/22/06, Jeff Bonwick [EMAIL PROTECTED] wrote:
  a test against the same iscsi targets using linux and XFS and the
  NFS server implementation there gave me 1.25MB/sec writes. I was about
  to throw in the towel and deem ZFS/NFS has unusable until B41 came
  along and at least gave me 1.25MB/sec.
 
 That's still super slow -- is this over a 10Mb link or something?
 
 Jeff
 
 

 Nope, gig-e link (single e1000g, or aggregate, doesn't matter) to the
 iscsi target, and single gig-e link (nge) to the NFS clients, who are
 gig-e. Sun Ultra20 or AMD Quad Opteron, again with no difference.

 Again, the issue is the multiple fsyncs that NFS requires, and likely
 the serialization of those iscsi requests. Apparently, there is a
 basic latency in iscsi that one could improve upon with FC, but we are
 definitely in the all ethernet/iscsi camp for multi-building storage
 pool growth and don't have interest in a FC-based SAN.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Well, following Bill's advice and the previous note on disabling zil,
I ran my test on a B38 opteron initiator and if you do a time on the
copy from the client, 6250 8k files transfer at 6MB/sec now. If you
watch the entire commit on the backend using zpool iostat 1 I see
that it takes a few more seconds, and the actual rate there is
4MB/sec. Beats my best of 1.25MB/sec, and this is not B41.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs going out to lunch

2006-06-02 Thread Joe Little

I've been writing via tar to a pool some stuff from backup, around
500GB. Its taken quite a while as the tar is being read from NFS. My
ZFS partition in this case is a RAIDZ 3-disk job using 3 400GB SATA
drives (sil3124 card)

Ever once in a while, a df stalls and during that time my io's go
flat, as in :

  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pool 571G   545G  0 34   1017  2.25M
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G  0  0  0  0
pool 571G   545G 48176  82.8K  5.31M
pool 571G   545G 48313   283K  26.0M
pool 571G   545G299130  1.05M  4.05M
pool 571G   545G163160   932K  4.70M
pool 571G   545G320  0  1.02M  0

Is this an ARC issue or some sort of flush happening. This is B40.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance metric/cookbook/whitepaper

2006-06-01 Thread Joe Little

Please add to the list the differences on locally or remotely attach
vdevs: FC, SCSI/SATA, or iSCSI. This is the part that is troubling me
most, as there are wildly different performance characteristics when
you use NFS with any of these backends with the various configs of
ZFS. Another thing is when where cache should be or not be used on
backend RAID devices (RAID vs JBOD point made already).

The wild difference is between small and large file writes, and how
the backend can go from 10's of MB/sec to 10's of KB/sec. Really.


On 6/1/06, Erik Trimble [EMAIL PROTECTED] wrote:


Maybe the best thing here is to have us (i.e. the people on this list)
come up with a set of standard and expected use cases, and have the ZFS
team tell us what the relative performance/tradeoffs are.  I mean,
rather than us just asking a bunch of specific cases, a good whitepaper
Best Practices / Cookbook for ZFS would be nice.


For instance:


compare UFS/Solaris Volume Manager against ZFS in:

[random|sequential][small|large][read|write]
on
UFS/SVM: Raid-1, Raid-5, Raid 0+1
ZFS:  RaidZ, Mirrors




Relative Performance of HWRaid vs JBOD
e.g.

3510FC w/ RAID using ZFS
vs
3510FC JBOD using ZFS



I know a bunch of this has been discussed before (and I've read most of
it :-), but collecting it in one place and filling out the actual
analysis would be Really Nice.


--
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] cluster features

2006-05-31 Thread Joe Little

Well, here's my previous summary off list to different solaris folk
(regarding NFS serving via ZFS and iSCSI):

I want to use ZFS as a NAS with no bounds on the backing hardware (not
restricted to one boxes capacity). Thus, there are two options: FC SAN
or iSCSI. In my case, I have multi-building considerations and 10GB
ethernet layer-2 interconnects that make iscsi ideal. Our standard
users use NAS for collections of many small files to many large files
(source code repositories, simulations, cad tools, VM images,
rendering meta-forms, and final results). Ideally to allow for ongoing
growth and drive replacement across multiple iscsi targets, RAIDZ was
selected over static hardware raid solutions. This setup is very
similar to a gfiler (iscsi based) or otherwise a standard NetApp Filer
product, and it would have appeared that Sun is targeting this
solution. I need this setup for both Tier1 primary NAS storage, as
well as disk-to-disk Tier2 backup.

In my extensive testing (not so much benchmarking, and definitely
without the time/focus to learn dtrace and the like), we have found
out that ZFS can be used for a tier2 system and not for tier1 due to
pathologically poor performance via NFS against a ZFS filesystem based
on RAIDZ over non-local storage. We have extremely poor but more
acceptable performance using a non-RAIDZ configuration. Only in the
case of expensive  FC-SAN network implementation would it appear that
ZFS is workable. If this is the only workable solution, then ZFS has
lost its benefits over NetApp as we approach the same costs but do not
have the same current maturity. Is it a lost cause? Honestly, I need
to be convinced that this is workable, and so far optional solutions
have been shot down.

Evidence? The final synethetic test used was to generate a directory
of 6250 random 8k files. On an NFS client (solaris, linux, or even
loop-back on the server itself), run cp -r SRCDIR DESTDIR where
DESTDIR is on the NFS server. Averages from memory:

FSiSCSI backendRate
XFS  1.5TB single Lun ~1-1.1MB/sec
ZFS  1.5TB single Lun ~250-400KB/sec
ZFS  1.5TB RAIDZ (8 disks) ~25KB/sec

In the case of mixed sized files with predominantly small files above
and below 8K, I see the XFS solution jump to an average of
2.5-3MB/sec. The ZFS store over a single lun stay within
200-420KB/sec, and the RAIDZ range from 16-40KB/sec.

Likely caching and some dynamic behaviours cause ZFS to get worse with
mixed sizing, whereas XFS or such increases performance. Finally, by
switching to SMB and not using NFS, I can maintain over 3MB/sec rates.

Large files over NFS get more reasonable performance (14MB-28MB/sec)
on any given ZFS backend, and I get 30+MB/sec locally with spikes
close to 100MB/sec when writing locally. I only can maximize
performance on my ZFS backend if I use a blocksize (tests using dd) of
256K or greater. 128K seems to provide less overall datarates, and I
believe this is the default when I use cp, rsync, or other commands
locally.

In summary, I can make my ZFS-based initiator an NFS client or
otherwise use rsyncd to ameliorate the pathological NFS server
performance of the ZFS combination. I can then service files fine.
This solution allows us to move forward as a Tier2 only solution. If
_any_ thing can be done to address NFS and its interactions with ZFS,
and bring it close to 1MB/sec performance (these are gig-e
interconnects afterall, think about it) then it will only be 1/10th
the performance of a NetApp in this worse case scenario and perform
similar to the NetApp if not better in other cases. The NetApp can do
around 10MB/sec in the senario I'm depicting. Currently, we have
around 1/20th to 1/30th the performance level when not using RAIDZ,
and 1/200th using RAIDZ.

I just can't quite understand how we can go from a cp -p TESTDIR
DESTDIR of 50MB of small files locally in an instant and the OS
returning to the prompt. Zpool iostat showing the writes committed
over the next 3-6 seconds, and this is OK for on-disk consistency. But
then for some reason its required that the NFS client can't commit in
a similar fashion, with Solaris saying yes, we got it, here's
confirmation.. next just as it does locally. The data definitely gets
there at the same speed as my tests with remote iscsi pools and as an
NFS client shows. My naive sense is that this should be addressable at
some level without inducing corruption. I have a feeling that its
somehow being overly conservative in this stance.

On 5/30/06, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Joe,

Wednesday, May 31, 2006, 12:44:22 AM, you wrote:

JL Well, I would caution at this point against the iscsi backend if you
JL are planning on using NFS. We took a long winded conversation online
JL and have yet to return to this list, but the gist of it is that the
JL latency of iscsi along with the tendency for NFS to fsync 3 times per
JL write causes performance to drop 

Re: [zfs-discuss] cluster features

2006-05-30 Thread Joe Little

Well, I would caution at this point against the iscsi backend if you
are planning on using NFS. We took a long winded conversation online
and have yet to return to this list, but the gist of it is that the
latency of iscsi along with the tendency for NFS to fsync 3 times per
write causes performance to drop dramatically, and it gets much worse
for a RAIDZ config. If you want to go this route, FC is a current
suggested requirement.

On 5/30/06, Eric Schrock [EMAIL PROTECTED] wrote:

On Tue, May 30, 2006 at 03:55:09AM -0700, Ernst Rohlicek jun. wrote:
 Hello list,

 I've read about your fascinating new fs implementation, ZFS. I've seen
 alot - nbd, lvm, evms, pvfs2, gfs, ocfs - and I have to say: I'm quite
 impressed!

 I'd set up a few of my boxes to OpenSolaris for storage (using Linux
 and lvm right now - offers pooling, but no built-in fault-tolerance)
 if ZFS had one feature: Use of more than one machine - currently, as I
 understand it, if disks fail, no problem, but if the server machine
 fails, ...

 I read in your FAQ that cluster features are on the way and wanted to
 ask what's the status here :-)

 BTW I recently read about a filesystem, which has a pretty good
 cluster architecture, called Google File System. The article on the
 English Wikipedia has a good overview, a link to the detailed papers
 and a ZDNet interview about it.

 I just wanted to point that out to you, maybe some of its design /
 architecture is useful in ZFS's cluster mode.

For cross-machine tolerance, it should be possible (once the iSCSI
target is integrated) to create ZFS-backed iSCSI targets and then use
RAID-Z from a single host across machines.  This is not a true clustered
filesystem, as it has a single point of access, but it does get you
beyond the 'single node = dataloss' mode of failure.

As for the true clustered filesystem, we're still gathering
requirements.  We have some ideas in the pipeline, and it's definitely a
direction in which we are headed, but there's not much to say at this
point.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-11 Thread Joe Little

well, here's my first pass result:

[EMAIL PROTECTED] loges1]# time tar xf /root/linux-2.2.26.tar

real114m6.662s
user0m0.049s
sys 0m1.354s


On 5/11/06, Roch Bourbonnais - Performance Engineering
[EMAIL PROTECTED] wrote:


Joe Little writes:
  How did you get the average time for async writes? My client (lacking
  ptime, its linux) comes in at 50 minutes, not 50 seconds. I'm running
  again right now for a more accurate number. I'm untarring from a local
  file on the directory to the NFS share.
 

I used dtrace to measure times (I  used the sleep time so it
gives a ballpark figure).

I untared with the tar file on the NFS share.
Just retimed after moving the tar file to /tmp.

# ptime tar xf /tmp/linux-2.2.22.tar
ptime tar xf /tmp/linux-2.2.22.tar

real   49.630
user1.033
sys11.405


-r

 
  On 5/11/06, Roch Bourbonnais - Performance Engineering
  [EMAIL PROTECTED] wrote:
  
  
 # ptime tar xf linux-2.2.22.tar
 ptime tar xf linux-2.2.22.tar
  
 real   50.292
 user1.019
 sys11.417
 # ptime tar xf linux-2.2.22.tar
 ptime tar xf linux-2.2.22.tar
  
 real   56.833
 user1.056
 sys11.581
 #
  
  
   avg time waiting for async writes is around 3ms.
   How much are you getting for the tar xf ?
  
   -r
  
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-08 Thread Joe Little

I was asked to also snoop the iscsi end of things, trying to find
something different between the two. iscsi being relatively opaque, it
was easiest to find differences in the patterns. In the local copy to
RAIDZ example, the iscsi link would show packets of 1514 in length in
series of 5-10, with interjecting packets of 60 or 102, generally 2-4
in number. In the NFS client hitting the RAIDZ/iscsi combo, the iscsi
length would have 3-5 on average 1514 length packets with 5-7 packets
of 60 or 102 in between. Basically, the averages swapped, and its
likely because of a lot more meta data and/or write confirmations
going on in the NFS case.

At this point in time, I have two very important questions:

1) Is there any options available or planned to make NFS/ZFS work more
in concert to avoid this overhead, which with many small iscsi packets
(in the iscsi case) kills performance?

2) Is iscsi-backed storage, especially StorageTek acquired products,
in the planning matrix for supported ZFS (NAS) solutions? Also, why
hasn't this combination been tested to date since this appears to be
an achilles heal. Again, UFS does not have this problem, nor other
file systems on other OSes (namely, XFS, JFS, etc which I've tested
before)


On 5/8/06, Nicolas Williams [EMAIL PROTECTED] wrote:

On Fri, May 05, 2006 at 11:55:17PM -0500, Spencer Shepler wrote:
 On Fri, Joe Little wrote:
  Thanks. I'm playing with it now, trying to get the most succinct test.
  This is one thing that bothers me: Regardless of the backend, it
  appears that a delete of a large tree (say the linux kernel) over NFS
  takes forever, but its immediate when doing so locally. Is delete over
  NFS really take such a different code path?

 Yes.  As mentioned in my other email, the NFS protocol requires
 that operations like REMOVE, RMDIR, CREATE have the filesystem
 metadata written to stable storage/disk before sending a response
 to the client.  That is not required of local access and therefore
 the disparity between the two.

So then multi-threading rm/rmdir on the client-side would help, no?

Are there/should there be async versions of creat(2)/mkdir(2)/
rmdir(2)/link(2)/unlink(2)/...?

Nico
--


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-05 Thread Joe Little

Thanks for the tip. In the local case, I could send to the
iSCSI-backed ZFS RAIDZ at even faster rates, with a total elapsed time
of 50seconds (17 seconds better than UFS). However, I didn't even both
finishing the NFS client test, since it was taking a few seconds
between multiple 27K files. So, it didn't help NFS at all. I'm
wondering if there is something on the NFS end that needs changing,
no? Also, how would one easily script the mdb command below to make
permanent?


On 5/5/06, Eric Schrock [EMAIL PROTECTED] wrote:

My gut feeling is that somehow the DKIOCFLUSHWRITECACHE ioctls (which
translate to the SCSI flush write cace requests) are throwing iSCSI for
a loop.  We've exposed a number of bugs in our drivers because ZFS is
the first filesystem to actually care to issue this request.

To turn this off, you can try:

# mdb -kw
 ::walk spa | ::print spa_t spa_root_vdev | ::vdev -r
ADDR STATE AUX  DESCRIPTION
82dc16c0 HEALTHY   -root
82dc0640 HEALTHY   -  /dev/dsk/c0d0s0
 82dc0640::print -a vdev_t vdev_nowritecache
82dc0af8 vdev_nowritecache = 0 (B_FALSE)
 82dc0af8/W1
0x82dc0af8: 0   =   0x1


See if that makes a difference.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [dtrace-discuss] Re: [nfs-discuss] Script to trace NFSv3 client operations

2006-05-05 Thread Joe Little

Thanks. I'm playing with it now, trying to get the most succinct test.
This is one thing that bothers me: Regardless of the backend, it
appears that a delete of a large tree (say the linux kernel) over NFS
takes forever, but its immediate when doing so locally. Is delete over
NFS really take such a different code path?


On 5/5/06, Lisa Week [EMAIL PROTECTED] wrote:

These may help:

http://opensolaris.org/os/community/dtrace/scripts/
Check out iosnoop.d

http://www.solarisinternals.com/si/dtrace/index.php
Check out iotrace.d

- Lisa

Joe Little wrote On 05/05/06 18:59,:

 Are there known i/o or iscsi dtrace scripts available?

 On 5/5/06, Spencer Shepler [EMAIL PROTECTED] wrote:

 On Fri, Joe Little wrote:
  On 5/5/06, Eric Schrock [EMAIL PROTECTED] wrote:
  On Fri, May 05, 2006 at 03:46:08PM -0700, Joe Little wrote:
   Thanks for the tip. In the local case, I could send to the
   iSCSI-backed ZFS RAIDZ at even faster rates, with a total
 elapsed time
   of 50seconds (17 seconds better than UFS). However, I didn't
 even both
   finishing the NFS client test, since it was taking a few seconds
   between multiple 27K files. So, it didn't help NFS at all. I'm
   wondering if there is something on the NFS end that needs changing,
   no?
  
  Keep in mind that turning off this flag may corrupt on-disk state
 in the
  event of power loss, etc.  What was the delta in the local case?  17
  seconds better than UFS, but percentage wise how much faster than the
  original?
  
 
  I believe it was only about 5-10% faster. I don't have the time
  results off hand, just some dtrace latency reports.
 
  NFS has the property that it does an enormous amount of synchronous
  activity, which can tickle interesting pathologies.  But it's strange
  that it didn't help NFS that much.
 
  Should I also mount via async.. would this be honored on the Solaris
  end? The other option mentioned with similar caveats was nocto. I just
  tried with both, and the observed transfer rate was about 1.4k/s. It
  was painful deleting the 3G directory via NFS, with about 100k/s
  deletion rate on these 1000 files. Of course, When I went locally the
  delete was instantaneous.

 I wouldn't change any of the options at the client.  The issue
 is at the server side and none of the other combinations that you
 originally pointed out have this problem, right?  Mount options at the
 client will just muddy the waters.

 We need to understand if/what the NFS/ZFS/iscsi interaction is and why
 it is so much worse.  As Eric mentioned, there may be some interesting
 pathologies at play here and we need to understand what they are so
 they can be addressed.

 My suggestion is additional dtrace data collection but I don't have
 a specific suggestion as to how/what to track next.
 Because of the significant additional latency, I would be looking for
 big increases in the number of I/Os being generated to the iscsi backend
 as compared to the local attached case.  I would also look for
 some type of serialization of I/Os that is occurring with iscsi vs.
 the local attach.

 Spencer

 ___
 nfs-discuss mailing list
 [EMAIL PROTECTED]



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor directory traversal or small file performance?

2006-05-04 Thread Joe Little

I just responsed to the NFS list, and it definitely looks like a bad
interaction between NFS-ZFS-iSCSI, where as the first two (local
disk for ZFS) or the last two (no ZFS) are very fast. Are there posted
zfs dtrace scripts for observability of i/o?


On 5/4/06, Neil Perrin [EMAIL PROTECTED] wrote:

Actually the nfs slowness could be caused by the bug below,
but it doesn't explain the find . times on a local zfs.

Neil Perrin wrote On 05/04/06 21:01,:
 Was this a 32 bit intel system by chance?
 If so this is quite likely caused by:

 6413731 pathologically slower fsync on 32 bit systems

 This was fixed in snv_39.

 Joe Little wrote On 05/04/06 15:47,:

 I've been writing to the Solaris NFS list since I was getting some bad
 performance copying via NFS (noticeably there) a large set of small
 files. We have various source trees, including a tree with many linux
 versions that I was copying to my ZFS NAS-to-be. On large files, it
 flies pretty well, and zpool iostat 1 shows interesting patterns of
 writes in the low k's up to 102MB/sec and down again as buffered
 segments apparently are synced.

 However, in the numerous small file case, we see consistently only
 transfers in the low k's per second. First, to give some background,
 we are utilizing iscsi, with the backend made up a directly exposed
 SATA disks via the target. I've put them in a 8 disk raidz:

   pool: poola0
  state: ONLINE
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 poola0  ONLINE   0 0 0
   raidz ONLINE   0 0 0
 c2t1d0  ONLINE   0 0 0
 c2t2d0  ONLINE   0 0 0
 c2t3d0  ONLINE   0 0 0
 c2t4d0  ONLINE   0 0 0
 c2t5d0  ONLINE   0 0 0
 c2t6d0  ONLINE   0 0 0
 c2t7d0  ONLINE   0 0 0
 c2t8d0  ONLINE   0 0 0

 Again, I can get some great numbers on large files (doing a dd with a
 large blocksize screams!), but as a test, I took a problematic tree of
 around 1 million files, and walked it with a find/ls:

 bash-3.00# time find . \! -name .* | wc -l
   987423

 real53m52.285s
 user0m2.624s
 sys 0m27.980s

 That was local to the system, and not even NFS.

 The original files, located on a EXT3 RAID50, accessed via a linux
 client (NFS v3):
 [EMAIL PROTECTED] old-servers]# time find . \! -name .* | wc -l
 987423

 real1m4.255s
 user0m0.914s
 sys 0m6.976s

 Woe.. Something just isn't right here. Are there explicit ways I can
 find out what's wrong with my setup? This is from a dtrace/zdb/mdb
 neophyte. All I have been tracking with are zpool iostats.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--

Neil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss