[zfs-discuss] ZFS and NFS

2009-11-20 Thread Joe Cicardo

Hi,

My customer says:


Application has NFS directories with millions of files in a directory, 
and this can't changed.
We are having issues with the EMC appliance and RPC timeouts on the NFS 
lookup. I am looking doing
is moving one of the major NFS exports to as Sun 25k using VCS to 
cluster a ZFS RAIDZ that is then NFS exported.


For performance I am looking at disabling ZIL, since these files have 
almost identical names.


What are Sun's thoughts on this?


Thanks for any insight.

--
Joe Cicardo
Systems Engineer
Sun Microsystems, Inc.
joe.cica...@sun.com
972-546-3887

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and NFS

2009-11-20 Thread Neil Perrin



On 11/18/09 12:21, Joe Cicardo wrote:

Hi,

My customer says:


Application has NFS directories with millions of files in a directory, 
and this can't changed.
We are having issues with the EMC appliance and RPC timeouts on the NFS 
lookup. I am looking doing
is moving one of the major NFS exports to as Sun 25k using VCS to 
cluster a ZFS RAIDZ that is then NFS exported.


For performance I am looking at disabling ZIL, since these files have 
almost identical names.


I think there's some confusion about the function of the ZIL
because having files with identical names is irrelevant to the ZIL.
Perhaps the customer is thinking of the DNLC, which is a cache of name lookups.
The ZIL does handle changes to these NFS files though, as the NFS protocol
requires they be on stable storage after most NFS operations.

We don't recommend recommend disabling the ZIL as this can lead to
integrity of user data issues. This is not the same as zpool corruption.
One way to speed the ZIL up is to use a SSD as a separate log device.

You can check how much activity is going through the ZIL by running zilstat:

  http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and NFS

2009-11-20 Thread Bob Friesenhahn

On Wed, 18 Nov 2009, Joe Cicardo wrote:


For performance I am looking at disabling ZIL, since these files have almost 
identical names.


Out of curiosity, what correlation is there between ZIL and file 
names?  The ZIL is used for synchronous writes (e.g. the NFS write 
case).  After a file has been opened, it would be very surprised if 
ZFS cared about the file names since actual files are identified by an 
inode.  Only directory lookups would see these file names.  It is 
pretty normal that when a directory contains millions of files, that 
they use almost identical names.


Are the NFS operations which are timing out directory lookups, 'stat', 
or 'open' calls?


If files are also being created at a rapid pace, the reader may be 
blocked from accessing the directory while it is updated.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and nfs

2009-04-09 Thread OpenSolaris Forums
 I'm using Solaris 10 (10/08). This feature is what
 exactly i want. thank for response.


Duh. What I meant previously was that this feature
is not available in the Solaris 10 releases.

Cindy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS for NFS back end

2009-02-09 Thread John Welter
Hi everyone,

We are looking at ZFS to use as the back end to a pool of java servers
doing image processing and serving this content over the internet.

Our current solution is working well but the cost to scale and ability
to scale is becoming a problem.

Currently:

- 20TB NFS servers running FreeBSD
- Load balancer in front of them

A bit about the workload:

- 99.999% large reads, very small write requirement.
- Reads average from ~1MB to 60MB.
- Peak read bandwidth we see is ~180MB/s, with average around 20MB/s
during peak hours.

Proposed hardware:

- Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD.
- LSI 1068 based SAS cards * 2 per server
- 4 MD1000 with 1TB ES2's * 15
- Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis
- Intel 10 gig-e to the switching infrastructure

Questions:

1) Solaris, OpenSolaris, etc??  What's the best for production?

2) Anything wrong with the hardware we selected?

3) any other words of wisdom - we are just starting out with ZFS but do
have some Solaris background.

Thanks!

John
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for NFS back end

2009-02-09 Thread Bob Friesenhahn
On Mon, 9 Feb 2009, John Welter wrote:
 A bit about the workload:

 - 99.999% large reads, very small write requirement.
 - Reads average from ~1MB to 60MB.
 - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s
 during peak hours.

This is something that ZFS is particularly good at.

 Proposed hardware:

 - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD.
 - LSI 1068 based SAS cards * 2 per server
 - 4 MD1000 with 1TB ES2's * 15
 - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis
 - Intel 10 gig-e to the switching infrastructure

The only concern might be with the MD1000.  Make sure that you can 
obtain it as a JBOD SAS configuration without the advertised PERC RAID 
controller. The PERC RAID controller is likely to get in the way when 
using ZFS. There has been mention here about unpleasant behavior when 
hot-swapping a failed drive in a Dell drive array with their RAID 
controller (does not come back automatically).  Typically such 
simplified hardware is offered as expansion enclosures.

Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures.

It seems that you have done your homework well.

 1) Solaris, OpenSolaris, etc??  What's the best for production?

Choose Solaris 10U6 if OS stability and incremental patches are 
important for you.  ZFS boot from mirrored drives in the PowerEdge 
2970 should help make things very reliable, and the OS becomes easier 
to live-upgrade.

 3) any other words of wisdom - we are just starting out with ZFS but do
 have some Solaris background.

You didn't say if you will continue using FreeBSD.  While FreeBSD is a 
fine OS, my experience is that its client NFS read performance is 
considerably less than Solaris.  With Solaris clients and a Solaris 
server, the NFS read is close to wire speed.  FreeBSD's NFS client 
is not so good for bulk reads, presumably due to its 
read-ahead/caching strategy.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for NFS back end

2009-02-09 Thread John Welter
Sorry I wasn't clear that the clients that hit this NFS back end are all
Centos 5.2.  FreeBSD is only used for the current NFS servers (a legacy
deal) but that would go away with the new Solaris/ZFS back end.

Dell will sell their boxes with SAS/5e controllers which are just a LSI
1068 board - these work with the MD1000 as a JBOD (we are doing some
testing as we speak and it seems to work).  

The rest of the infrastructure is Dell so we are trying to stick with
themthe devil we know ;^)

Homework was easy with excellent resources like this listjust lurked
awhile and picked up a lot from the traffic.

Thanks again.

John


-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Monday, February 09, 2009 11:28 AM
To: John Welter
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS for NFS back end

On Mon, 9 Feb 2009, John Welter wrote:
 A bit about the workload:

 - 99.999% large reads, very small write requirement.
 - Reads average from ~1MB to 60MB.
 - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s
 during peak hours.

This is something that ZFS is particularly good at.

 Proposed hardware:

 - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD.
 - LSI 1068 based SAS cards * 2 per server
 - 4 MD1000 with 1TB ES2's * 15
 - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis
 - Intel 10 gig-e to the switching infrastructure

The only concern might be with the MD1000.  Make sure that you can 
obtain it as a JBOD SAS configuration without the advertised PERC RAID 
controller. The PERC RAID controller is likely to get in the way when 
using ZFS. There has been mention here about unpleasant behavior when 
hot-swapping a failed drive in a Dell drive array with their RAID 
controller (does not come back automatically).  Typically such 
simplified hardware is offered as expansion enclosures.

Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures.

It seems that you have done your homework well.

 1) Solaris, OpenSolaris, etc??  What's the best for production?

Choose Solaris 10U6 if OS stability and incremental patches are 
important for you.  ZFS boot from mirrored drives in the PowerEdge 
2970 should help make things very reliable, and the OS becomes easier 
to live-upgrade.

 3) any other words of wisdom - we are just starting out with ZFS but
do
 have some Solaris background.

You didn't say if you will continue using FreeBSD.  While FreeBSD is a 
fine OS, my experience is that its client NFS read performance is 
considerably less than Solaris.  With Solaris clients and a Solaris 
server, the NFS read is close to wire speed.  FreeBSD's NFS client 
is not so good for bulk reads, presumably due to its 
read-ahead/caching strategy.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch
Greg Mason writes:
  We're running into a performance problem with ZFS over NFS. When working 
  with many small files (i.e. unpacking a tar file with source code), a 
  Thor (over NFS) is about 4 times slower than our aging existing storage 
  solution, which isn't exactly speedy to begin with (17 minutes versus 3 
  minutes).
  
  We took a rough stab in the dark, and started to examine whether or not 
  it was the ZIL.
  
  Performing IO tests locally on the Thor shows no real IO problems, but 
  running IO tests over NFS, specifically, with many smaller files we see 
  a significant performance hit.
  
  Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
  test again. It completed in just under a minute, around 3 times faster 
  than our existing storage. This was more like it!
  
  Are there any tunables for the ZIL to try to speed things up? Or would 
  it be best to look into using a high-speed SSD for the log device?
  
  And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
  We do, however, need to provide our users with a certain level of 
  performance, and what we've got with the ZIL on the pool is completely 
  unacceptable.
  
  Thanks for any pointers you may have...
  

I think you found out for the replies, this NFS issue is not
related to ZFS nor a ZIL malfunction in any way.

http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 

NFS (particularly lightly threaded load)  is much speeded up
with any form of SSD|NVRAM storage and that's independant on
the backing filesystem used (provided the Filesystem is safe).

For ZFS the best way to acheive  NFS performance for lightly
threaded loads  is to have a   separate intent log  in a low
latency device such as in the 7000 line.

-r



  --
  
  Greg Mason
  Systems Administrator
  Michigan State University
  High Performance Computing Center
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch

Nicholas Lee writes:
  Another option to look at is:
  set zfs:zfs_nocacheflush=1
  http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
  
  Best option is to get a a fast ZIL log device.
  
  
  Depends on your pool as well. NFS+ZFS means zfs will wait for write
  completes before responding to a sync NFS write ops.  If you have a RAIDZ
  array, writes will be slower than a RAID10 style pool.
  

Nicholas,

Raid-Z requires a  more complexity in software however
the total amount of I/O to disk is less than raid-10. So the
net performance effect is often in favor of Raid-10 must not necessarely
so.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch

Eric D. Mudama writes:

  On Mon, Jan 19 at 23:14, Greg Mason wrote:
  So, what we're looking for is a way to improve performance, without  
  disabling the ZIL, as it's my understanding that disabling the ZIL  
  isn't exactly a safe thing to do.
  
  We're looking for the best way to improve performance, without  
  sacrificing too much of the safety of the data.
  
  The current solution we are considering is disabling the cache  
  flushing (as per a previous response in this thread), and adding one  
  or two SSD log devices, as this is similar to the Sun storage  
  appliances based on the Thor. Thoughts?
  
  In general principles, the evil tuning guide states that the ZIL
  should be able to handle 10 seconds of expected synchronous write
  workload.
  
  To me, this implies that it's improving burst behavior, but
  potentially at the expense of sustained throughput, like would be
  measured in benchmarking type runs.
  
  If you have a big JBOD array with say 8+ mirror vdevs on multiple
  controllers, in theory, each VDEV can commit from 60-80MB/s to disk.
  Unless you are attaching a separate ZIL device that can match the
  aggregate throughput of that pool, wouldn't it just be better to have
  the default behavior of the ZIL contents being inside the pool itself?
  
  The best practices guide states that the max ZIL device size should be
  roughly 50% of main system memory, because that's approximately the
  most data that can be in-flight at any given instant.
  
  For a target throughput of X MB/sec and given that ZFS pushes
  transaction groups every 5 seconds (and have 2 outstanding), we also
  expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service
  100MB/sec of synchronous writes, 1 GBytes of log device should be
  sufficient.
  
  But, no comments are made on the performance requirements of the ZIL
  device(s) relative to the main pool devices.  Clicking around finds
  this entry:
  
  http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
  
  ...which appears to indicate cases where a significant number of ZILs
  were required to match the bandwidth of just throwing them in the pool
  itself.
  
  


Big topic. Some write requests are synchronous and some
not, some start as non synchronous and end up being synced.

For non-synchronous loads, ZFS does not commit data to the
slog. The presence of the slog is transparent and won't
hinder performance.

For synchronous loads, the performance is normally governed
by fewer threads commiting more modest amount of data;
performance here is dominated by latency effect, not disk
throughput and this is where a slog greatly helps (10X).

Now you're right to point out that some workloads might end
up as synchronous while still manageing large quantity of
data. The Storage 7000 line was tweaked to handle some of
those cases. So when commiting more say 10MB in a single
operation, the first MB will go to the SSD but the rest will
actually be send to the main storage pool. All these I/Os being
issued concurrently. The latency response of a 1 MB to our
SSD is expected to be similar to the response of regular
disks.


-r



  --eric
  
  
  -- 
  Eric D. Mudama
  edmud...@mail.bounceswoosh.org
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-26 Thread Roch

Eric D. Mudama writes:
  On Tue, Jan 20 at 21:35, Eric D. Mudama wrote:
   On Tue, Jan 20 at  9:04, Richard Elling wrote:
  
   Yes.  And I think there are many more use cases which are not
   yet characterized.  What we do know is that using an SSD for
   the separate ZIL log works very well for a large number of cases.
   It is not clear to me that the efforts to characterize a large
   number of cases is worthwhile, when we can simply throw an SSD
   at the problem and solve it.
-- richard
  
  
   I think the issue is, like a previous poster discovered, there's not a
   lot of available data on exact performance changes of adding ZIL/L2ARC
   devices in a variety of workloads, so people wind up spending money
   and doing lots of trial and error, without clear expectations of
   whether their modifications are working or not.
  
  Sorry for that terrible last sentence, my brain is fried right now.
  
  I was trying to say that most people don't know what they're going to
  get out of an SSD or other ZIL/L2ARC device ahead of time, since it
  varies so much by workload, configuration, etc. and it's an expensive
  problem to solve through trial an error since these
  performance-improving devices are many times more expensive than the
  raw SAS/SATA devices in the main pool.
  

I agree with you on the L2ARC front but not on the SSD for
ZIL. We clearly expect 10X gain for lightly threaded
workloads and that's a big satifyer because not everything
happen with large amount of concurrency and some high value
tasks do not.

On the L2ARC  the benefit are less  direct because of the L1
ARC presence. The gains, if present will  be of the similar
nature with   8-10X  gain to   workloads  that  are  lightly
threaded  and served   from L2ARC vs   disk.  Note that it's
possible  to configurewhich   (higher businessvalue)
filesystems are allowed to install in the L2ARC.

One dirty way to evaluate if the  L2ARC will be effective in
your environment is to  consider if the  last X GB  of added
memory had a positive impact on your performance
metrics (does nailing down memory reduces performance ?).
If so then on the graph of performance vs caching you are
still on a positive slope and L2ARC is likely to help. When
request you care most about are served from caches, or
when something else saturates (e.g. total  CPU) then it's
time to stop.

-r



  -- 
  Eric D. Mudama
  edmud...@mail.bounceswoosh.org
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-20 Thread Richard Elling
Good observations, Eric, more below...

Eric D. Mudama wrote:
 On Mon, Jan 19 at 23:14, Greg Mason wrote:
 So, what we're looking for is a way to improve performance, without  
 disabling the ZIL, as it's my understanding that disabling the ZIL  
 isn't exactly a safe thing to do.

 We're looking for the best way to improve performance, without  
 sacrificing too much of the safety of the data.

 The current solution we are considering is disabling the cache  
 flushing (as per a previous response in this thread), and adding one  
 or two SSD log devices, as this is similar to the Sun storage  
 appliances based on the Thor. Thoughts?
 
 In general principles, the evil tuning guide states that the ZIL
 should be able to handle 10 seconds of expected synchronous write
 workload.
 
 To me, this implies that it's improving burst behavior, but
 potentially at the expense of sustained throughput, like would be
 measured in benchmarking type runs.

Yes.  Workloads that tend to be latency sensitive also tend
to be bursty. Or, perhaps that is just how it feels to a user.
Similar observations are made in the GUI design business where
user interactions are bursty, but latency sensitive.

 If you have a big JBOD array with say 8+ mirror vdevs on multiple
 controllers, in theory, each VDEV can commit from 60-80MB/s to disk.
 Unless you are attaching a separate ZIL device that can match the
 aggregate throughput of that pool, wouldn't it just be better to have
 the default behavior of the ZIL contents being inside the pool itself?

The problem is that the ZIL writes must be committed to disk
and magnetic disks rotate.  So the time to commit to media is,
on average, disregarding seeks, 1/2 the rotational period.
This ranges from 2 ms (15k rpm) to 5.5 ms (5,400 rpm). If the
workload is something like a tar -x of small files (source code)
then a 4.17 ms (7,200 rpm) disk would limit my extraction to a
maximum of 240 files/s.  If these are 4kByte files, the bandwidth
would peak at about 1 MByte/s.  Upgrading to a 15k rpm disk would
move the peak to about 500 files/s or 2.25 MBytes/s.  Using a
decent SSD would change this to 5000 files/s or 22.5 MBytes/s.

 The best practices guide states that the max ZIL device size should be
 roughly 50% of main system memory, because that's approximately the
 most data that can be in-flight at any given instant.

There is a little bit of discussion about this point, because
it really speaks to the ARC in general.  Look for it to be
clarified soon.  Also note that this is much more of a problem
for small memory machines.

 For a target throughput of X MB/sec and given that ZFS pushes
 transaction groups every 5 seconds (and have 2 outstanding), we also
 expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service
 100MB/sec of synchronous writes, 1 GBytes of log device should be
 sufficient.

It is a little bit more complicated than that because if the size
of the ZIL write is  32 kBYtes, then it will be written directly
to the main pool, not the ZIL log.  This is because if you have
lots of large synchronous writes, then the system can become
bandwidth limited rather than latency limited and the way to solve
bandwidth problems is to reduce bandwidth demand.

 But, no comments are made on the performance requirements of the ZIL
 device(s) relative to the main pool devices.  Clicking around finds
 this entry:
 
 http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
 
 ...which appears to indicate cases where a significant number of ZILs
 were required to match the bandwidth of just throwing them in the pool
 itself.

Yes.  And I think there are many more use cases which are not
yet characterized.  What we do know is that using an SSD for
the separate ZIL log works very well for a large number of cases.
It is not clear to me that the efforts to characterize a large
number of cases is worthwhile, when we can simply throw an SSD
at the problem and solve it.
  -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-20 Thread Doug
Any recommendations for an SSD to work with an X4500 server?  Will the SSDs 
used in the 7000 series servers work with X4500s or X4540s?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-20 Thread Marion Hakanson
d...@yahoo.com said:
 Any recommendations for an SSD to work with an X4500 server?  Will the SSDs
 used in the 7000 series servers work with X4500s or X4540s? 

The Sun System Handbook (sunsolve.sun.com) for the 7210 appliance (an
X4540-based system) lists the logzilla device with this fine print:
  PN#371-4192 Solid State disk drives can only be installed in slots 3 and 11.

Makes me wonder if they would work in our X4500 NFS server.  Our ZFS pool is
already deployed (Solaris-10), but we have four hot spares -- two of which
could be given up in favor of a mirrored ZIL.  An OS upgrade to S10U6 would
give the separate-log functionality, if the drivers, etc. supported the
actual SSD device.  I doubt we'll go out and buy them before finding out
if they'll actually work -- it would be a real shame if they didn't, though.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-20 Thread Eric D. Mudama
On Tue, Jan 20 at  9:04, Richard Elling wrote:

 Yes.  And I think there are many more use cases which are not
 yet characterized.  What we do know is that using an SSD for
 the separate ZIL log works very well for a large number of cases.
 It is not clear to me that the efforts to characterize a large
 number of cases is worthwhile, when we can simply throw an SSD
 at the problem and solve it.
  -- richard


I think the issue is, like a previous poster discovered, there's not a
lot of available data on exact performance changes of adding ZIL/L2ARC
devices in a variety of workloads, so people wind up spending money
and doing lots of trial and error, without clear expectations of
whether their modifications are working or not.


-- 
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-20 Thread Eric D. Mudama
On Tue, Jan 20 at 21:35, Eric D. Mudama wrote:
 On Tue, Jan 20 at  9:04, Richard Elling wrote:

 Yes.  And I think there are many more use cases which are not
 yet characterized.  What we do know is that using an SSD for
 the separate ZIL log works very well for a large number of cases.
 It is not clear to me that the efforts to characterize a large
 number of cases is worthwhile, when we can simply throw an SSD
 at the problem and solve it.
  -- richard


 I think the issue is, like a previous poster discovered, there's not a
 lot of available data on exact performance changes of adding ZIL/L2ARC
 devices in a variety of workloads, so people wind up spending money
 and doing lots of trial and error, without clear expectations of
 whether their modifications are working or not.

Sorry for that terrible last sentence, my brain is fried right now.

I was trying to say that most people don't know what they're going to
get out of an SSD or other ZIL/L2ARC device ahead of time, since it
varies so much by workload, configuration, etc. and it's an expensive
problem to solve through trial an error since these
performance-improving devices are many times more expensive than the
raw SAS/SATA devices in the main pool.

-- 
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
We're running into a performance problem with ZFS over NFS. When working 
with many small files (i.e. unpacking a tar file with source code), a 
Thor (over NFS) is about 4 times slower than our aging existing storage 
solution, which isn't exactly speedy to begin with (17 minutes versus 3 
minutes).

We took a rough stab in the dark, and started to examine whether or not 
it was the ZIL.

Performing IO tests locally on the Thor shows no real IO problems, but 
running IO tests over NFS, specifically, with many smaller files we see 
a significant performance hit.

Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
test again. It completed in just under a minute, around 3 times faster 
than our existing storage. This was more like it!

Are there any tunables for the ZIL to try to speed things up? Or would 
it be best to look into using a high-speed SSD for the log device?

And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
We do, however, need to provide our users with a certain level of 
performance, and what we've got with the ZIL on the pool is completely 
unacceptable.

Thanks for any pointers you may have...

--

Greg Mason
Systems Administrator
Michigan State University
High Performance Computing Center
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Nicholas Lee
Another option to look at is:
set zfs:zfs_nocacheflush=1
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Best option is to get a a fast ZIL log device.


Depends on your pool as well. NFS+ZFS means zfs will wait for write
completes before responding to a sync NFS write ops.  If you have a RAIDZ
array, writes will be slower than a RAID10 style pool.


On Tue, Jan 20, 2009 at 11:08 AM, Greg Mason gma...@msu.edu wrote:

 We're running into a performance problem with ZFS over NFS. When working
 with many small files (i.e. unpacking a tar file with source code), a
 Thor (over NFS) is about 4 times slower than our aging existing storage
 solution, which isn't exactly speedy to begin with (17 minutes versus 3
 minutes).

 We took a rough stab in the dark, and started to examine whether or not
 it was the ZIL.

 Performing IO tests locally on the Thor shows no real IO problems, but
 running IO tests over NFS, specifically, with many smaller files we see
 a significant performance hit.

 Just to rule in or out the ZIL as a factor, we disabled it, and ran the
 test again. It completed in just under a minute, around 3 times faster
 than our existing storage. This was more like it!

 Are there any tunables for the ZIL to try to speed things up? Or would
 it be best to look into using a high-speed SSD for the log device?

 And, yes, I already know that turning off the ZIL is a Really Bad Idea.
 We do, however, need to provide our users with a certain level of
 performance, and what we've got with the ZIL on the pool is completely
 unacceptable.

 Thanks for any pointers you may have...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Richard Elling
Greg Mason wrote:
 We're running into a performance problem with ZFS over NFS. When working 
 with many small files (i.e. unpacking a tar file with source code), a 
 Thor (over NFS) is about 4 times slower than our aging existing storage 
 solution, which isn't exactly speedy to begin with (17 minutes versus 3 
 minutes).
 
 We took a rough stab in the dark, and started to examine whether or not 
 it was the ZIL.

It is. I've recently added some clarification to this section in the
Evil Tuning Guide which might help you to arrive at a better solution.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
Feedback is welcome.
  -- richard

 Performing IO tests locally on the Thor shows no real IO problems, but 
 running IO tests over NFS, specifically, with many smaller files we see 
 a significant performance hit.
 
 Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
 test again. It completed in just under a minute, around 3 times faster 
 than our existing storage. This was more like it!
 
 Are there any tunables for the ZIL to try to speed things up? Or would 
 it be best to look into using a high-speed SSD for the log device?
 
 And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
 We do, however, need to provide our users with a certain level of 
 performance, and what we've got with the ZIL on the pool is completely 
 unacceptable.
 
 Thanks for any pointers you may have...
 
 --
 
 Greg Mason
 Systems Administrator
 Michigan State University
 High Performance Computing Center
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
So, what we're looking for is a way to improve performance, without  
disabling the ZIL, as it's my understanding that disabling the ZIL  
isn't exactly a safe thing to do.

We're looking for the best way to improve performance, without  
sacrificing too much of the safety of the data.

The current solution we are considering is disabling the cache  
flushing (as per a previous response in this thread), and adding one  
or two SSD log devices, as this is similar to the Sun storage  
appliances based on the Thor. Thoughts?

-Greg

On Jan 19, 2009, at 6:24 PM, Richard Elling wrote:

 We took a rough stab in the dark, and started to examine whether or  
 not it was the ZIL.

 It is. I've recently added some clarification to this section in the
 Evil Tuning Guide which might help you to arrive at a better solution.
 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
 Feedback is welcome.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Richard Elling
Greg Mason wrote:
 So, what we're looking for is a way to improve performance, without 
 disabling the ZIL, as it's my understanding that disabling the ZIL isn't 
 exactly a safe thing to do.
 
 We're looking for the best way to improve performance, without 
 sacrificing too much of the safety of the data.
 
 The current solution we are considering is disabling the cache flushing 
 (as per a previous response in this thread), and adding one or two SSD 
 log devices, as this is similar to the Sun storage appliances based on 
 the Thor. Thoughts?

Good idea.  Thor has a CF slot, too, if you can find a high speed
CF card.
  -- richard


 -Greg
 
 On Jan 19, 2009, at 6:24 PM, Richard Elling wrote:

 We took a rough stab in the dark, and started to examine whether or 
 not it was the ZIL.

 It is. I've recently added some clarification to this section in the
 Evil Tuning Guide which might help you to arrive at a better solution.
 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
  

 Feedback is welcome.
 -- richard
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Bob Friesenhahn
On Mon, 19 Jan 2009, Greg Mason wrote:

 The current solution we are considering is disabling the cache
 flushing (as per a previous response in this thread), and adding one
 or two SSD log devices, as this is similar to the Sun storage
 appliances based on the Thor. Thoughts?

You need to add some sort of fast non-volatile cache.

The Sun storage appliances are actually using battery backed DRAM for 
their write caches.  This sort of hardware is quite rare.

Fast SSD log devices are apparently pretty expensive.  Some of the 
ones for sale are actually pretty slow.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason

 Good idea.  Thor has a CF slot, too, if you can find a high speed
 CF card.
 -- richard

We're already using the CF slot for the OS. We haven't really found  
any CF cards that would be fast enough anyways :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Eric D. Mudama
On Mon, Jan 19 at 23:14, Greg Mason wrote:
So, what we're looking for is a way to improve performance, without  
disabling the ZIL, as it's my understanding that disabling the ZIL  
isn't exactly a safe thing to do.

We're looking for the best way to improve performance, without  
sacrificing too much of the safety of the data.

The current solution we are considering is disabling the cache  
flushing (as per a previous response in this thread), and adding one  
or two SSD log devices, as this is similar to the Sun storage  
appliances based on the Thor. Thoughts?

In general principles, the evil tuning guide states that the ZIL
should be able to handle 10 seconds of expected synchronous write
workload.

To me, this implies that it's improving burst behavior, but
potentially at the expense of sustained throughput, like would be
measured in benchmarking type runs.

If you have a big JBOD array with say 8+ mirror vdevs on multiple
controllers, in theory, each VDEV can commit from 60-80MB/s to disk.
Unless you are attaching a separate ZIL device that can match the
aggregate throughput of that pool, wouldn't it just be better to have
the default behavior of the ZIL contents being inside the pool itself?

The best practices guide states that the max ZIL device size should be
roughly 50% of main system memory, because that's approximately the
most data that can be in-flight at any given instant.

For a target throughput of X MB/sec and given that ZFS pushes
transaction groups every 5 seconds (and have 2 outstanding), we also
expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service
100MB/sec of synchronous writes, 1 GBytes of log device should be
sufficient.

But, no comments are made on the performance requirements of the ZIL
device(s) relative to the main pool devices.  Clicking around finds
this entry:

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

...which appears to indicate cases where a significant number of ZILs
were required to match the bandwidth of just throwing them in the pool
itself.


--eric


-- 
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS or NFS?

2007-09-17 Thread Ian Collins
I have a build 62 system with a zone that NFS mounts an ZFS filesystem. 

From the zone, I keep seeing issues with .nfs files remaining in
otherwise empty directories preventing their deletion.  The files appear
to be immediately replaced when they are deleted.

Is this an NFS or a ZFS issue?

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or NFS?

2007-09-17 Thread Darren J Moffat
Ian Collins wrote:
 I have a build 62 system with a zone that NFS mounts an ZFS filesystem. 
 
From the zone, I keep seeing issues with .nfs files remaining in
 otherwise empty directories preventing their deletion.  The files appear
 to be immediately replaced when they are deleted.
 
 Is this an NFS or a ZFS issue?

It is NFS that is doing that.  It happens when a process on the NFS 
client still has the file open.  fuser(1) is your friend here.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or NFS?

2007-09-17 Thread Robert Thurlow
Ian Collins wrote:
 I have a build 62 system with a zone that NFS mounts an ZFS filesystem. 
 
From the zone, I keep seeing issues with .nfs files remaining in
 otherwise empty directories preventing their deletion.  The files appear
 to be immediately replaced when they are deleted.
 
 Is this an NFS or a ZFS issue?

This is the NFS client keeping unlinked but open files around.
You need to find out what process has the files open (perhaps
with fuser -c) and persuade them to close the files before
you can unmount gracefully.  You may also use umount -f if
you don't care what happens to the processes.

Rob T
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or NFS?

2007-09-17 Thread Paul Kraus
On 9/17/07, Darren J Moffat [EMAIL PROTECTED] wrote:

 It is NFS that is doing that.  It happens when a process on the NFS
 client still has the file open.  fuser(1) is your friend here.

... and if fuser doesn't tell you what you need to know, you can use
lsof ( http://freshmeat.net/projects/lsof/ I usually just get it
precompiled from http://www.sunfreeware.com/ ). I have found lsof to
be more reliable that fuser in listing what has a file open.

-- 
Paul Kraus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or NFS?

2007-09-17 Thread Richard Elling
Ian Collins wrote:
 I have a build 62 system with a zone that NFS mounts an ZFS filesystem. 
 
From the zone, I keep seeing issues with .nfs files remaining in
 otherwise empty directories preventing their deletion.  The files appear
 to be immediately replaced when they are deleted.
 
 Is this an NFS or a ZFS issue?

That is how NFS deals with files that are unlinked while open.  In a local
file system, unlinked while open files will simply not be deleted until the
close.  For remote file systems, like NFS, you have to remove the file from
the namespace, but not remove the file's content.  The client will do this
by creating .nfs files.  A more detailed explanation is at:
http://nfs.sourceforge.net/

  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and NFS Mounting - Missing Permissions

2007-07-23 Thread Scott Adair
Hi

I'm trying to setup a new NFS server, and wish to use Solaris and ZFS. I have a 
ZFS filesystem set up to handle the users home directories and setup sharing

   # zfs list
   NAME   USED  AVAIL  REFER  MOUNTPOINT
   data   896K  9.75G  35.3K  /data
   data/home  751K  9.75G  38.0K  /data/home
   data/home/bob 32.6K  9.75G  32.6K  /data/home/bob
   data/home/joe  647K  9.37M   647K  /data/home/joe
   data/home/paul32.6K  9.75G  32.6K  /data/home/paul

   # zfs get sharenfs data/home 
   NAME PROPERTY   VALUE  SOURCE
   data/homesharenfs   rw local 

And these directories are owned by the user

   # ls -l /data/home
   total 12
   drwxrwxr-x   2 bob  sigma  2 Jul 23 08:47 bob
   drwxrwxr-x   2 joe  sigma  4 Jul 23 11:31 joe
   drwxrwxr-x   2 paul sigma  2 Jul 23 08:47 paul

I have the top level directory shared (/data/home). When I mount this on the 
client pc (ubuntu) I loose all the permissions, and can't see any of the files..

   [EMAIL PROTECTED]:/nfs/home# ls -l
   total 6
   drwxr-xr-x 2 root root 2 2007-07-23 08:47 bob
   drwxr-xr-x 2 root root 2 2007-07-23 08:47 joe 
   drwxr-xr-x 2 root root 2 2007-07-23 08:47 paul

   [EMAIL PROTECTED]:/nfs/home# ls -l joe
   total 0

However, when I mount each directory manually, it works.. 

   [EMAIL PROTECTED]:~# mount torit01sx:/data/home/joe /scott

   [EMAIL PROTECTED]:~# ls -l /scott
   total 613
   -rwxrwxrwx 1 joe sigma 612352 2007-07-23 11:32 file

Any ideas? When I try the same thing with a UFS based filesystem it works as 
expected

   [EMAIL PROTECTED]:/# mount torit01sx:/export/home /scott

   [EMAIL PROTECTED]:/# ls -l scott
   total 1
   drwxr-xr-x 2 joe sigma 512 2007-07-23 12:25 joe

Any help would be greatly appreciated.. Thanks in advance

Scott
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and NFS Mounting - Missing Permissions

2007-07-23 Thread Richard Elling
Scott Adair wrote:
 Hi
 
 I'm trying to setup a new NFS server, and wish to use Solaris and ZFS. I have 
 a ZFS filesystem set up to handle the users home directories and setup sharing
 
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data   896K  9.75G  35.3K  /data
data/home  751K  9.75G  38.0K  /data/home
data/home/bob 32.6K  9.75G  32.6K  /data/home/bob
data/home/joe  647K  9.37M   647K  /data/home/joe
data/home/paul32.6K  9.75G  32.6K  /data/home/paul
 
# zfs get sharenfs data/home 
NAME PROPERTY   VALUE  SOURCE
data/homesharenfs   rw local 
 
 And these directories are owned by the user
 
# ls -l /data/home
total 12
drwxrwxr-x   2 bob  sigma  2 Jul 23 08:47 bob
drwxrwxr-x   2 joe  sigma  4 Jul 23 11:31 joe
drwxrwxr-x   2 paul sigma  2 Jul 23 08:47 paul
 
 I have the top level directory shared (/data/home). When I mount this on the 
 client pc (ubuntu) I loose all the permissions, and can't see any of the 
 files..

/data/home is a different file system than /data/home/joe.  NFS shares do not
cross file system boundaries.  You'll need to share /data/home/joe, too.
  -- richard

[EMAIL PROTECTED]:/nfs/home# ls -l
total 6
drwxr-xr-x 2 root root 2 2007-07-23 08:47 bob
drwxr-xr-x 2 root root 2 2007-07-23 08:47 joe 
drwxr-xr-x 2 root root 2 2007-07-23 08:47 paul
 
[EMAIL PROTECTED]:/nfs/home# ls -l joe
total 0
 
 However, when I mount each directory manually, it works.. 
 
[EMAIL PROTECTED]:~# mount torit01sx:/data/home/joe /scott
 
[EMAIL PROTECTED]:~# ls -l /scott
total 613
-rwxrwxrwx 1 joe sigma 612352 2007-07-23 11:32 file
 
 Any ideas? When I try the same thing with a UFS based filesystem it works as 
 expected
 
[EMAIL PROTECTED]:/# mount torit01sx:/export/home /scott
 
[EMAIL PROTECTED]:/# ls -l scott
total 1
drwxr-xr-x 2 joe sigma 512 2007-07-23 12:25 joe
 
 Any help would be greatly appreciated.. Thanks in advance
 
 Scott
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS with NFS

2007-06-11 Thread Mohammed Beik

Hi

Has anyone any notes on how best configure ZFS pool for NFS mount
to a 4-node RAC cluster.
I am particularly interested in config options for zfs/zpool and NFS
options at kernel level.
The zpool is being presented from x4500 (thumper), and NFS presented to
four nodes (x8400). There will be high io transactions being carried
out on these filesystems.
And one last thing Its a production env.
Any pointers, gotcha, patches, helpful NFS kernel parameters would be 
appreciated.

Thanks
Mo
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with NFS

2007-06-11 Thread Richard Elling

Mohammed Beik wrote:

Hi

Has anyone any notes on how best configure ZFS pool for NFS mount
to a 4-node RAC cluster.
I am particularly interested in config options for zfs/zpool and NFS
options at kernel level.
The zpool is being presented from x4500 (thumper), and NFS presented to
four nodes (x8400). There will be high io transactions being carried
out on these filesystems.
And one last thing Its a production env.
Any pointers, gotcha, patches, helpful NFS kernel parameters would be 
appreciated.


Is Solaris Cluster being used with RAC?
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS disables nfs/server on a host

2007-04-19 Thread Mike Lee
Could it be an order problem? NFS trying to start before zfs is mounted? 
Just a guess, of course. I'm not real savvy in either realm.


HTH,
Mike


Ben Miller wrote:


I have an Ultra 10 client running Sol10 U3 that has a zfs pool set up on the 
extra space of the internal ide disk.  There's just the one fs and it is shared 
with the sharenfs property.  When this system reboots nfs/server ends up 
getting disabled and this is the error from the SMF logs:

[ Apr 16 08:41:22 Executing start method (/lib/svc/method/nfs-server start) ]
[ Apr 16 08:41:24 Method start exited with status 0 ]
[ Apr 18 10:59:23 Executing start method (/lib/svc/method/nfs-server start) ]
Assertion failed: pclose(fp) == 0, file ../common/libzfs_mount.c, line 380, 
function zfs_share

If I re-enable nfs/server after the system is up it's fine.  The system was 
recently upgraded to use zfs and this has happened on the last two reboots.  We 
have lots of other systems that share nfs through zfs fine and I didn't see a 
similar problem on the list.  Any ideas?

Ben


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 



--
http://www.sun.com/solaris  * Michael Lee *
Area System Support Engineer

*Sun Microsystems, Inc.*
Phone x40782 / 866 877 8350
Email [EMAIL PROTECTED]
http://www.sun.com/solaris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] ZFS disables nfs/server on a host

2007-04-19 Thread Robert van Veelen

 I have seen a previous discussion with the same error. I don't think a
solution was posted though.
The libzfs_mount.c source indicates that the 'share' command returned
non zero but specified no error. Can you run 'share' manually after a
fresh boot? There may be some insight if it fails, though as you
describe it, share should work without problems.

-Robert

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs
/common/libzfs_mount.c?r=789

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Ben Miller
 Sent: Thursday, April 19, 2007 9:05 AM
 To: zfs-discuss@opensolaris.org
 Subject: [zfs-discuss] ZFS disables nfs/server on a host
 
 I have an Ultra 10 client running Sol10 U3 that has a zfs 
 pool set up on the extra space of the internal ide disk.  
 There's just the one fs and it is shared with the sharenfs 
 property.  When this system reboots nfs/server ends up 
 getting disabled and this is the error from the SMF logs:
 
 [ Apr 16 08:41:22 Executing start method 
 (/lib/svc/method/nfs-server start) ]
 [ Apr 16 08:41:24 Method start exited with status 0 ]
 [ Apr 18 10:59:23 Executing start method 
 (/lib/svc/method/nfs-server start) ]
 Assertion failed: pclose(fp) == 0, file 
 ../common/libzfs_mount.c, line 380, function zfs_share
 
 If I re-enable nfs/server after the system is up it's fine.  
 The system was recently upgraded to use zfs and this has 
 happened on the last two reboots.  We have lots of other 
 systems that share nfs through zfs fine and I didn't see a 
 similar problem on the list.  Any ideas?
 
 Ben
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs NFS vs array caches, revisited

2007-02-13 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 The only obvious thing would be if the exported ZFS filesystems where
 initially mounted at a point in time when zil_disable was non-null. 

No changes have been made to zil_disable.  It's 0 now, and we've never
changed the setting.  Export/import doesn't appear to change the behavior.


[EMAIL PROTECTED] said:
 You might want to try in turn:
   dtrace -n 'sd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL 
 PROTECTED](20)]=count()}'
   dtrace -n 'sdioctl:[EMAIL PROTECTED](20)]=count()}'
   dtrace -n zil_flush_vdevs:[EMAIL PROTECTED](20)]=count()}'
   dtrace -n zil_commit_writer:[EMAIL PROTECTED](20)]=count()}'
 And see if you loose your footing along the way. 

I've included below the complete list of dtrace output.  This system has
two zpools, one that goes fast for NFS and one that goes slow.  You
can see the details of the pools' configs below.  Let me re-state that
at times in the past, the fast pool has gone slow, and I don't know
what made it start going fast again.

To summarize, the first dtrace above gives no output on the fast pool,
and lists 6, 7, 12, or 14 calls for the slow pool.  The second dtrace
above counts 6 or 7 calls on both pools.  The last third dtrace above
gives no output for either pool, but zil_flush_vdevs isn't in the stack
trace for the earlier trace on my machine (SPARC, Sol-10U3).  The last
dtrace doesn't find a matching probe here.

=

# echo zil_disable/D | mdb -k 
zil_disable:
zil_disable:0   
# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
bulk_zp1   2.14T160K   2.14T 0%  ONLINE -
bulk_zp2   2.14T346K   2.14T 0%  ONLINE -
int01  48.2G   1.94G   46.3G 4%  ONLINE -
# cd
# zpool export bulk_zp1
# zpool export bulk_zp2
# zpool import
  pool: bulk_zp2
id: 803252704584693135
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

bulk_zp2 ONLINE
  raidz1 ONLINE
c6t4849544143484920443630303133323230303330d0s0  ONLINE
c6t4849544143484920443630303133323230303330d0s1  ONLINE
c6t4849544143484920443630303133323230303331d0s0  ONLINE
c6t4849544143484920443630303133323230303331d0s1  ONLINE
c6t4849544143484920443630303133323230303332d0s0  ONLINE
c6t4849544143484920443630303133323230303332d0s1  ONLINE

  pool: bulk_zp1
id: 14914295292657419291
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

bulk_zp1 ONLINE
  raidz1 ONLINE
c6t4849544143484920443630303133323230303230d0s0  ONLINE
c6t4849544143484920443630303133323230303230d0s1  ONLINE
c6t4849544143484920443630303133323230303231d0s0  ONLINE
c6t4849544143484920443630303133323230303231d0s1  ONLINE
c6t4849544143484920443630303133323230303232d0s0  ONLINE
c6t4849544143484920443630303133323230303232d0s1  ONLINE
c6t4849544143484920443630303133323230303232d0s2  ONLINE
# zpool import bulk_zp1
# zpool import bulk_zp2
# zfs list bulk_zp1
NAME   USED  AVAIL  REFER  MOUNTPOINT
bulk_zp1   123K  1.79T  53.6K  /zp1
# zfs list bulk_zp2
NAME   USED  AVAIL  REFER  MOUNTPOINT
bulk_zp2   193K  1.75T  63.9K  /zp2
# dtrace -n 'ssd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' \
  -n 'sd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}'
dtrace: description 'ssd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe
dtrace: description 'sd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe
^C

# : no output from zp1 test.
# dtrace -n 'ssd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' \
  -n 'sd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}'
dtrace: description 'ssd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe
dtrace: description 'sd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe
^C


  ssd`ssdioctl+0x17a8
  zfs`vdev_disk_io_start+0xa0
  zfs`zio_ioctl+0xec
  zfs`vdev_config_sync+0xe0
  zfs`spa_sync+0x2ec
  zfs`txg_sync_thread+0x134
  unix`thread_start+0x4
   12

  ssd`ssdioctl+0x17a8
  zfs`vdev_disk_io_start+0xa0
  zfs`zio_ioctl+0xec
  zfs`vdev_config_sync+0x258
  zfs`spa_sync+0x2ec
  zfs`txg_sync_thread+0x134
  unix`thread_start+0x4
   12
# : above output from zp2 test.
# dtrace -n 'ssdioctl:[EMAIL PROTECTED](20)]=count()}' -n 
'sdioctl:[EMAIL PROTECTED](20)]=count()}'
dtrace: 

Re: [zfs-discuss] ZFS vs NFS vs array caches, revisited

2007-02-02 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 The reality is that
   ZFS turns on the write cache when it owns the
   whole disk.
 _Independantly_ of that,
   ZFS flushes the write cache when ZFS needs to insure 
   that data reaches stable storage.
 
 The point is that the flushes occur whether or not ZFS turned the caches on
 or not (caches might be turned on by some other means outside the visibility
 of ZFS). 

Thanks for taking the time to clear this up for us (assuming others than
just me had this misunderstanding :-).

Yet today I measured something that leaves me puzzled again.  How can we
explain the following results?

# zpool status -v
  pool: bulk_zp1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ 
WRITE CKSUM
bulk_zp1 ONLINE   0
 0 0
  raidz1 ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s0  ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s1  ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s2  ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s3  ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s4  ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s5  ONLINE   0
 0 0
c6t4849544143484920443630303133323230303230d0s6  ONLINE   0

0 0

errors: No known data errors
# prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400 34 613563821 613563854
   1  400  613563855 613563821 1227127675
   2  400  1227127676 613563821 1840691496
   3  400  1840691497 613563821 2454255317
   4  400  2454255318 613563821 3067819138
   5  400  3067819139 613563821 3681382959
   6  400  3681382960 613563821 4294946780
   8 1100  4294946783 16384 4294963166
# 

And, at a later time:
# zpool status -v bulk_sp1s
  pool: bulk_sp1s
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ 
WRITE CKSUM
bulk_sp1s  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s0  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s1  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s2  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s3  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s4  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s5  ONLINE   0 
0 0
  c6t4849544143484920443630303133323230303230d0s6  ONLINE   0 
0 0

errors: No known data errors
# 


The storage is that same single 2TB LUN I used yesterday, except I've
used format to slice it up into 7 equal chunks, and made a raidz
(and later a simple striped) pool across all of them.  My tar over NFS
benchmark on these goes pretty fast.  If ZFS is making the flush-cache call,
it sure works faster than in the whole-LUN case:

ZFS on whole-disk FC-SATA LUN via NFS, yesterday:
real 968.13
user 0.33
sys 0.04
  7.9 KB/sec overall

ZFS on whole-disk FC-SATA LUN via NFS, ssd_max_throttle=32 today:
real 664.78
user 0.33
sys 0.04
  11.4 KB/sec overall

ZFS raidz on 7 slices of FC-SATA LUN via NFS today:
real 12.32
user 0.32
sys 0.03
  620.2 KB/sec overall

ZFS striped on 7 slices of FC-SATA LUN via NFS today:
real 6.51
user 0.32
sys 0.03
  1178.3 KB/sec overall

Not that I'm not complaining, mind you.  I appear to have stumbled across
a way to get NFS over ZFS to work at a reasonable speed, without making
changes to the array (nor resorting to giving ZFS SVN soft partitions
instead of real devices).  Suboptimal, mind you, but it's workable
if our Hitachi folks don't turn up a way to tweak the array.

Guess I should go read the ZFS source code (though my 10U3 surely lags
the Opensolaris stuff).

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vs NFS vs array caches, revisited

2007-02-01 Thread Marion Hakanson
I had followed with interest the turn off NV cache flushing thread, in
regard to doing ZFS-backed NFS on our low-end Hitachi array:

  http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg05000.html

In short, if you have non-volatile cache, you can configure the array
to ignore the ZFS cache-flush requests.  This is reported to improve the
really terrible performance of ZFS-backed NFS systems.  Feel free to
correct me if I'm misremembering

Anyway, I've also read that if ZFS notices it's using slices instead of
whole disks, it will not enable/use the write cache.  So I thought I'd be
clever and configure a ZFS pool on our array with a slice of a LUN instead
of the whole LUN, and fool ZFS into not issuing cache-flushes, rather
than having to change config of the array itself.

Unfortunately, it didn't make a bit of difference in my little NFS benchmark,
namely extracting a small 7.6MB tar file (C++ source code, 500 files/dirs).

I used three test zpools and a UFS filesystem (not all were in play at the
same time):
  pool: bulk_sp1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE 
CKSUM
bulk_sp1 ONLINE   0 0  
   0
  c6t4849544143484920443630303133323230303230d0  ONLINE   0 0  
   0

errors: No known data errors

  pool: bulk_sp1s
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ 
WRITE
CKSUM
bulk_sp1s  ONLINE   0 
0
0
  c6t4849544143484920443630303133323230303230d0s0  ONLINE   0 
0
0

errors: No known data errors

  pool: int01
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
int01 ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s5  ONLINE   0 0 0
c0t1d0s5  ONLINE   0 0 0

errors: No known data errors

# prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400 34 4294879232 4294879265
   1  400  4294879266 67517 4294946782
   8 1100  4294946783 16384 4294963166
# 

Both NFS client and server are Sun T2000's, 16GB RAM, switched gigabit
ethernet, Solaris-10U3 patched as of 12-Jan-2007, doing nothing else
at the time of the tests.

The bulk_sp1* pools were both on the same Hitachi 9520V RAID-5 SATA group
that I ran my bonnie++ tests on yesterday.  The int01 pool is mirrored
on two slice-5's of the server T2000's internal 2.5 SAS 73GB drives.

ZFS on whole-disk FC-SATA LUN via NFS:
real 968.13
user 0.33
sys 0.04
  7.9 KB/sec overall

ZFS on partial slice-0 of FC-SATA LUN via NFS:
real 950.77
user 0.33
sys 0.04
  8.0 KB/sec overall

ZFS on slice-5 mirror of internal SAS drives via NFS:
real 17.48
user 0.32
sys 0.03
  438.8 KB/sec overall

UFS on partial slice-0 of FC-SATA LUN via NFS:
real 6.13
user 0.32
sys 0.03
  1251.4 KB/sec overall


I'm not willing to disable the ZIL.  I think I'd settle for the 400KB/sec
range in this test from NFS on ZFS, if I could get that on our FC-SATA
Hitachi array.  As things are now, ZFS just won't work for us, and I'm
not sure how to make it go faster.

Thoughts  suggestions are welcome

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Brad Plecs
I had a user report extreme slowness on a ZFS filesystem mounted over NFS over 
the weekend. 
After some extensive testing, the extreme slowness appears to only occur when a 
ZFS filesystem is mounted over NFS.  

One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS 
filesystem.  this takes: 

real5m12.423s
user0m0.936s
sys 0m4.760s

Locally on the server (to the same ZFS filesystem) takes: 

real0m4.415s
user0m1.884s
sys 0m3.395s

The same job over NFS to a UFS filesystem takes

real1m22.725s
user0m0.901s
sys 0m4.479s

Same job locally on server to same UFS filesystem: 

real0m10.150s
user0m2.121s
sys 0m4.953s


This is easily reproducible even with single large files, but the multiple 
small files 
seems to illustrate some awful sync latency between each file.  

Any idea why ZFS over NFS is so bad?  I saw the threads that talk about an 
fsync penalty, 
but they don't seem relevant since the local ZFS performance is quite good.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Jason J. W. Williams

Hi Brad,

I believe benr experienced the same/similar issue here:
http://www.opensolaris.org/jive/message.jspa?messageID=77347

If it is the same, I believe its a known ZFS/NFS interaction bug, and
has to do with small file creation.

Best Regards,
Jason

On 1/2/07, Brad Plecs [EMAIL PROTECTED] wrote:

I had a user report extreme slowness on a ZFS filesystem mounted over NFS over 
the weekend.
After some extensive testing, the extreme slowness appears to only occur when a 
ZFS filesystem is mounted over NFS.

One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS 
filesystem.  this takes:

real5m12.423s
user0m0.936s
sys 0m4.760s

Locally on the server (to the same ZFS filesystem) takes:

real0m4.415s
user0m1.884s
sys 0m3.395s

The same job over NFS to a UFS filesystem takes

real1m22.725s
user0m0.901s
sys 0m4.479s

Same job locally on server to same UFS filesystem:

real0m10.150s
user0m2.121s
sys 0m4.953s


This is easily reproducible even with single large files, but the multiple 
small files
seems to illustrate some awful sync latency between each file.

Any idea why ZFS over NFS is so bad?  I saw the threads that talk about an 
fsync penalty,
but they don't seem relevant since the local ZFS performance is quite good.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Ben Rockwood

Brad Plecs wrote:
I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. 
After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS.  

One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS filesystem.  this takes: 


real5m12.423s
user0m0.936s
sys 0m4.760s

Locally on the server (to the same ZFS filesystem) takes: 


real0m4.415s
user0m1.884s
sys 0m3.395s

The same job over NFS to a UFS filesystem takes

real1m22.725s
user0m0.901s
sys 0m4.479s

Same job locally on server to same UFS filesystem: 


real0m10.150s
user0m2.121s
sys 0m4.953s


This is easily reproducible even with single large files, but the multiple small files 
seems to illustrate some awful sync latency between each file.  

Any idea why ZFS over NFS is so bad?  I saw the threads that talk about an fsync penalty, 
but they don't seem relevant since the local ZFS performance is quite good.
  


Known issue, discussed here: 
http://www.opensolaris.org/jive/thread.jspa?threadID=14696tstart=15


benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Dennis Clarke

 Another thing to keep an eye out for is disk caching.  With ZFS,
 whenever the NFS server tells us to make sure something is on disk, we
 actually make sure it's on disk by asking the drive to flush dirty data
 in its write cache out to the media.  Needless to say, this takes a
 while.

 With UFS, it isn't aware of the extra level of caching, and happily
 pretends it's in a world where once the drive ACKs a write, it's on
 stable storage.

 If you use format(1M) and take a look at whether or not the drive's
 write cache is enabled, that should shed some light on this.  If it's
 on, try turning it off and re-run your NFS tests on ZFS vs. UFS.

 Either way, let us know what you find out.

Slightly OT but you just reminded me of why I like disks that have Sun
firmware on them.  They never have write cache on.  At least I have never
seen it. Read cache yes but write cache never.  At least in the Seagates and
Fujitsus Ultra320 SCSI/FCAL disks that have a Sun logo on them.

I have no idea what else that Sun firmware does on a SCSI disk but I'd love
to know :-)

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss