[zfs-discuss] ZFS and NFS
Hi, My customer says: Application has NFS directories with millions of files in a directory, and this can't changed. We are having issues with the EMC appliance and RPC timeouts on the NFS lookup. I am looking doing is moving one of the major NFS exports to as Sun 25k using VCS to cluster a ZFS RAIDZ that is then NFS exported. For performance I am looking at disabling ZIL, since these files have almost identical names. What are Sun's thoughts on this? Thanks for any insight. -- Joe Cicardo Systems Engineer Sun Microsystems, Inc. joe.cica...@sun.com 972-546-3887 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and NFS
On 11/18/09 12:21, Joe Cicardo wrote: Hi, My customer says: Application has NFS directories with millions of files in a directory, and this can't changed. We are having issues with the EMC appliance and RPC timeouts on the NFS lookup. I am looking doing is moving one of the major NFS exports to as Sun 25k using VCS to cluster a ZFS RAIDZ that is then NFS exported. For performance I am looking at disabling ZIL, since these files have almost identical names. I think there's some confusion about the function of the ZIL because having files with identical names is irrelevant to the ZIL. Perhaps the customer is thinking of the DNLC, which is a cache of name lookups. The ZIL does handle changes to these NFS files though, as the NFS protocol requires they be on stable storage after most NFS operations. We don't recommend recommend disabling the ZIL as this can lead to integrity of user data issues. This is not the same as zpool corruption. One way to speed the ZIL up is to use a SSD as a separate log device. You can check how much activity is going through the ZIL by running zilstat: http://www.richardelling.com/Home/scripts-and-programs-1/zilstat Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and NFS
On Wed, 18 Nov 2009, Joe Cicardo wrote: For performance I am looking at disabling ZIL, since these files have almost identical names. Out of curiosity, what correlation is there between ZIL and file names? The ZIL is used for synchronous writes (e.g. the NFS write case). After a file has been opened, it would be very surprised if ZFS cared about the file names since actual files are identified by an inode. Only directory lookups would see these file names. It is pretty normal that when a directory contains millions of files, that they use almost identical names. Are the NFS operations which are timing out directory lookups, 'stat', or 'open' calls? If files are also being created at a rapid pace, the reader may be blocked from accessing the directory while it is updated. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs and nfs
I'm using Solaris 10 (10/08). This feature is what exactly i want. thank for response. Duh. What I meant previously was that this feature is not available in the Solaris 10 releases. Cindy -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS for NFS back end
Hi everyone, We are looking at ZFS to use as the back end to a pool of java servers doing image processing and serving this content over the internet. Our current solution is working well but the cost to scale and ability to scale is becoming a problem. Currently: - 20TB NFS servers running FreeBSD - Load balancer in front of them A bit about the workload: - 99.999% large reads, very small write requirement. - Reads average from ~1MB to 60MB. - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s during peak hours. Proposed hardware: - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD. - LSI 1068 based SAS cards * 2 per server - 4 MD1000 with 1TB ES2's * 15 - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis - Intel 10 gig-e to the switching infrastructure Questions: 1) Solaris, OpenSolaris, etc?? What's the best for production? 2) Anything wrong with the hardware we selected? 3) any other words of wisdom - we are just starting out with ZFS but do have some Solaris background. Thanks! John ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for NFS back end
On Mon, 9 Feb 2009, John Welter wrote: A bit about the workload: - 99.999% large reads, very small write requirement. - Reads average from ~1MB to 60MB. - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s during peak hours. This is something that ZFS is particularly good at. Proposed hardware: - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD. - LSI 1068 based SAS cards * 2 per server - 4 MD1000 with 1TB ES2's * 15 - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis - Intel 10 gig-e to the switching infrastructure The only concern might be with the MD1000. Make sure that you can obtain it as a JBOD SAS configuration without the advertised PERC RAID controller. The PERC RAID controller is likely to get in the way when using ZFS. There has been mention here about unpleasant behavior when hot-swapping a failed drive in a Dell drive array with their RAID controller (does not come back automatically). Typically such simplified hardware is offered as expansion enclosures. Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures. It seems that you have done your homework well. 1) Solaris, OpenSolaris, etc?? What's the best for production? Choose Solaris 10U6 if OS stability and incremental patches are important for you. ZFS boot from mirrored drives in the PowerEdge 2970 should help make things very reliable, and the OS becomes easier to live-upgrade. 3) any other words of wisdom - we are just starting out with ZFS but do have some Solaris background. You didn't say if you will continue using FreeBSD. While FreeBSD is a fine OS, my experience is that its client NFS read performance is considerably less than Solaris. With Solaris clients and a Solaris server, the NFS read is close to wire speed. FreeBSD's NFS client is not so good for bulk reads, presumably due to its read-ahead/caching strategy. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for NFS back end
Sorry I wasn't clear that the clients that hit this NFS back end are all Centos 5.2. FreeBSD is only used for the current NFS servers (a legacy deal) but that would go away with the new Solaris/ZFS back end. Dell will sell their boxes with SAS/5e controllers which are just a LSI 1068 board - these work with the MD1000 as a JBOD (we are doing some testing as we speak and it seems to work). The rest of the infrastructure is Dell so we are trying to stick with themthe devil we know ;^) Homework was easy with excellent resources like this listjust lurked awhile and picked up a lot from the traffic. Thanks again. John -Original Message- From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] Sent: Monday, February 09, 2009 11:28 AM To: John Welter Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS for NFS back end On Mon, 9 Feb 2009, John Welter wrote: A bit about the workload: - 99.999% large reads, very small write requirement. - Reads average from ~1MB to 60MB. - Peak read bandwidth we see is ~180MB/s, with average around 20MB/s during peak hours. This is something that ZFS is particularly good at. Proposed hardware: - Dell PowerEdge 2970's, 16GB RAM, quad cores of AMD. - LSI 1068 based SAS cards * 2 per server - 4 MD1000 with 1TB ES2's * 15 - Configured as 2 * 7 disk RaidZ2 with 1 HS per chassis - Intel 10 gig-e to the switching infrastructure The only concern might be with the MD1000. Make sure that you can obtain it as a JBOD SAS configuration without the advertised PERC RAID controller. The PERC RAID controller is likely to get in the way when using ZFS. There has been mention here about unpleasant behavior when hot-swapping a failed drive in a Dell drive array with their RAID controller (does not come back automatically). Typically such simplified hardware is offered as expansion enclosures. Sun, IBM, and Adaptec, also offer good JBOD SAS enclosures. It seems that you have done your homework well. 1) Solaris, OpenSolaris, etc?? What's the best for production? Choose Solaris 10U6 if OS stability and incremental patches are important for you. ZFS boot from mirrored drives in the PowerEdge 2970 should help make things very reliable, and the OS becomes easier to live-upgrade. 3) any other words of wisdom - we are just starting out with ZFS but do have some Solaris background. You didn't say if you will continue using FreeBSD. While FreeBSD is a fine OS, my experience is that its client NFS read performance is considerably less than Solaris. With Solaris clients and a Solaris server, the NFS read is close to wire speed. FreeBSD's NFS client is not so good for bulk reads, presumably due to its read-ahead/caching strategy. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Greg Mason writes: We're running into a performance problem with ZFS over NFS. When working with many small files (i.e. unpacking a tar file with source code), a Thor (over NFS) is about 4 times slower than our aging existing storage solution, which isn't exactly speedy to begin with (17 minutes versus 3 minutes). We took a rough stab in the dark, and started to examine whether or not it was the ZIL. Performing IO tests locally on the Thor shows no real IO problems, but running IO tests over NFS, specifically, with many smaller files we see a significant performance hit. Just to rule in or out the ZIL as a factor, we disabled it, and ran the test again. It completed in just under a minute, around 3 times faster than our existing storage. This was more like it! Are there any tunables for the ZIL to try to speed things up? Or would it be best to look into using a high-speed SSD for the log device? And, yes, I already know that turning off the ZIL is a Really Bad Idea. We do, however, need to provide our users with a certain level of performance, and what we've got with the ZIL on the pool is completely unacceptable. Thanks for any pointers you may have... I think you found out for the replies, this NFS issue is not related to ZFS nor a ZIL malfunction in any way. http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine NFS (particularly lightly threaded load) is much speeded up with any form of SSD|NVRAM storage and that's independant on the backing filesystem used (provided the Filesystem is safe). For ZFS the best way to acheive NFS performance for lightly threaded loads is to have a separate intent log in a low latency device such as in the 7000 line. -r -- Greg Mason Systems Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Nicholas Lee writes: Another option to look at is: set zfs:zfs_nocacheflush=1 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide Best option is to get a a fast ZIL log device. Depends on your pool as well. NFS+ZFS means zfs will wait for write completes before responding to a sync NFS write ops. If you have a RAIDZ array, writes will be slower than a RAID10 style pool. Nicholas, Raid-Z requires a more complexity in software however the total amount of I/O to disk is less than raid-10. So the net performance effect is often in favor of Raid-10 must not necessarely so. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Eric D. Mudama writes: On Mon, Jan 19 at 23:14, Greg Mason wrote: So, what we're looking for is a way to improve performance, without disabling the ZIL, as it's my understanding that disabling the ZIL isn't exactly a safe thing to do. We're looking for the best way to improve performance, without sacrificing too much of the safety of the data. The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? In general principles, the evil tuning guide states that the ZIL should be able to handle 10 seconds of expected synchronous write workload. To me, this implies that it's improving burst behavior, but potentially at the expense of sustained throughput, like would be measured in benchmarking type runs. If you have a big JBOD array with say 8+ mirror vdevs on multiple controllers, in theory, each VDEV can commit from 60-80MB/s to disk. Unless you are attaching a separate ZIL device that can match the aggregate throughput of that pool, wouldn't it just be better to have the default behavior of the ZIL contents being inside the pool itself? The best practices guide states that the max ZIL device size should be roughly 50% of main system memory, because that's approximately the most data that can be in-flight at any given instant. For a target throughput of X MB/sec and given that ZFS pushes transaction groups every 5 seconds (and have 2 outstanding), we also expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service 100MB/sec of synchronous writes, 1 GBytes of log device should be sufficient. But, no comments are made on the performance requirements of the ZIL device(s) relative to the main pool devices. Clicking around finds this entry: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on ...which appears to indicate cases where a significant number of ZILs were required to match the bandwidth of just throwing them in the pool itself. Big topic. Some write requests are synchronous and some not, some start as non synchronous and end up being synced. For non-synchronous loads, ZFS does not commit data to the slog. The presence of the slog is transparent and won't hinder performance. For synchronous loads, the performance is normally governed by fewer threads commiting more modest amount of data; performance here is dominated by latency effect, not disk throughput and this is where a slog greatly helps (10X). Now you're right to point out that some workloads might end up as synchronous while still manageing large quantity of data. The Storage 7000 line was tweaked to handle some of those cases. So when commiting more say 10MB in a single operation, the first MB will go to the SSD but the rest will actually be send to the main storage pool. All these I/Os being issued concurrently. The latency response of a 1 MB to our SSD is expected to be similar to the response of regular disks. -r --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Eric D. Mudama writes: On Tue, Jan 20 at 21:35, Eric D. Mudama wrote: On Tue, Jan 20 at 9:04, Richard Elling wrote: Yes. And I think there are many more use cases which are not yet characterized. What we do know is that using an SSD for the separate ZIL log works very well for a large number of cases. It is not clear to me that the efforts to characterize a large number of cases is worthwhile, when we can simply throw an SSD at the problem and solve it. -- richard I think the issue is, like a previous poster discovered, there's not a lot of available data on exact performance changes of adding ZIL/L2ARC devices in a variety of workloads, so people wind up spending money and doing lots of trial and error, without clear expectations of whether their modifications are working or not. Sorry for that terrible last sentence, my brain is fried right now. I was trying to say that most people don't know what they're going to get out of an SSD or other ZIL/L2ARC device ahead of time, since it varies so much by workload, configuration, etc. and it's an expensive problem to solve through trial an error since these performance-improving devices are many times more expensive than the raw SAS/SATA devices in the main pool. I agree with you on the L2ARC front but not on the SSD for ZIL. We clearly expect 10X gain for lightly threaded workloads and that's a big satifyer because not everything happen with large amount of concurrency and some high value tasks do not. On the L2ARC the benefit are less direct because of the L1 ARC presence. The gains, if present will be of the similar nature with 8-10X gain to workloads that are lightly threaded and served from L2ARC vs disk. Note that it's possible to configurewhich (higher businessvalue) filesystems are allowed to install in the L2ARC. One dirty way to evaluate if the L2ARC will be effective in your environment is to consider if the last X GB of added memory had a positive impact on your performance metrics (does nailing down memory reduces performance ?). If so then on the graph of performance vs caching you are still on a positive slope and L2ARC is likely to help. When request you care most about are served from caches, or when something else saturates (e.g. total CPU) then it's time to stop. -r -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Good observations, Eric, more below... Eric D. Mudama wrote: On Mon, Jan 19 at 23:14, Greg Mason wrote: So, what we're looking for is a way to improve performance, without disabling the ZIL, as it's my understanding that disabling the ZIL isn't exactly a safe thing to do. We're looking for the best way to improve performance, without sacrificing too much of the safety of the data. The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? In general principles, the evil tuning guide states that the ZIL should be able to handle 10 seconds of expected synchronous write workload. To me, this implies that it's improving burst behavior, but potentially at the expense of sustained throughput, like would be measured in benchmarking type runs. Yes. Workloads that tend to be latency sensitive also tend to be bursty. Or, perhaps that is just how it feels to a user. Similar observations are made in the GUI design business where user interactions are bursty, but latency sensitive. If you have a big JBOD array with say 8+ mirror vdevs on multiple controllers, in theory, each VDEV can commit from 60-80MB/s to disk. Unless you are attaching a separate ZIL device that can match the aggregate throughput of that pool, wouldn't it just be better to have the default behavior of the ZIL contents being inside the pool itself? The problem is that the ZIL writes must be committed to disk and magnetic disks rotate. So the time to commit to media is, on average, disregarding seeks, 1/2 the rotational period. This ranges from 2 ms (15k rpm) to 5.5 ms (5,400 rpm). If the workload is something like a tar -x of small files (source code) then a 4.17 ms (7,200 rpm) disk would limit my extraction to a maximum of 240 files/s. If these are 4kByte files, the bandwidth would peak at about 1 MByte/s. Upgrading to a 15k rpm disk would move the peak to about 500 files/s or 2.25 MBytes/s. Using a decent SSD would change this to 5000 files/s or 22.5 MBytes/s. The best practices guide states that the max ZIL device size should be roughly 50% of main system memory, because that's approximately the most data that can be in-flight at any given instant. There is a little bit of discussion about this point, because it really speaks to the ARC in general. Look for it to be clarified soon. Also note that this is much more of a problem for small memory machines. For a target throughput of X MB/sec and given that ZFS pushes transaction groups every 5 seconds (and have 2 outstanding), we also expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service 100MB/sec of synchronous writes, 1 GBytes of log device should be sufficient. It is a little bit more complicated than that because if the size of the ZIL write is 32 kBYtes, then it will be written directly to the main pool, not the ZIL log. This is because if you have lots of large synchronous writes, then the system can become bandwidth limited rather than latency limited and the way to solve bandwidth problems is to reduce bandwidth demand. But, no comments are made on the performance requirements of the ZIL device(s) relative to the main pool devices. Clicking around finds this entry: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on ...which appears to indicate cases where a significant number of ZILs were required to match the bandwidth of just throwing them in the pool itself. Yes. And I think there are many more use cases which are not yet characterized. What we do know is that using an SSD for the separate ZIL log works very well for a large number of cases. It is not clear to me that the efforts to characterize a large number of cases is worthwhile, when we can simply throw an SSD at the problem and solve it. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Any recommendations for an SSD to work with an X4500 server? Will the SSDs used in the 7000 series servers work with X4500s or X4540s? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
d...@yahoo.com said: Any recommendations for an SSD to work with an X4500 server? Will the SSDs used in the 7000 series servers work with X4500s or X4540s? The Sun System Handbook (sunsolve.sun.com) for the 7210 appliance (an X4540-based system) lists the logzilla device with this fine print: PN#371-4192 Solid State disk drives can only be installed in slots 3 and 11. Makes me wonder if they would work in our X4500 NFS server. Our ZFS pool is already deployed (Solaris-10), but we have four hot spares -- two of which could be given up in favor of a mirrored ZIL. An OS upgrade to S10U6 would give the separate-log functionality, if the drivers, etc. supported the actual SSD device. I doubt we'll go out and buy them before finding out if they'll actually work -- it would be a real shame if they didn't, though. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
On Tue, Jan 20 at 9:04, Richard Elling wrote: Yes. And I think there are many more use cases which are not yet characterized. What we do know is that using an SSD for the separate ZIL log works very well for a large number of cases. It is not clear to me that the efforts to characterize a large number of cases is worthwhile, when we can simply throw an SSD at the problem and solve it. -- richard I think the issue is, like a previous poster discovered, there's not a lot of available data on exact performance changes of adding ZIL/L2ARC devices in a variety of workloads, so people wind up spending money and doing lots of trial and error, without clear expectations of whether their modifications are working or not. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
On Tue, Jan 20 at 21:35, Eric D. Mudama wrote: On Tue, Jan 20 at 9:04, Richard Elling wrote: Yes. And I think there are many more use cases which are not yet characterized. What we do know is that using an SSD for the separate ZIL log works very well for a large number of cases. It is not clear to me that the efforts to characterize a large number of cases is worthwhile, when we can simply throw an SSD at the problem and solve it. -- richard I think the issue is, like a previous poster discovered, there's not a lot of available data on exact performance changes of adding ZIL/L2ARC devices in a variety of workloads, so people wind up spending money and doing lots of trial and error, without clear expectations of whether their modifications are working or not. Sorry for that terrible last sentence, my brain is fried right now. I was trying to say that most people don't know what they're going to get out of an SSD or other ZIL/L2ARC device ahead of time, since it varies so much by workload, configuration, etc. and it's an expensive problem to solve through trial an error since these performance-improving devices are many times more expensive than the raw SAS/SATA devices in the main pool. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS over NFS, poor performance with many small files
We're running into a performance problem with ZFS over NFS. When working with many small files (i.e. unpacking a tar file with source code), a Thor (over NFS) is about 4 times slower than our aging existing storage solution, which isn't exactly speedy to begin with (17 minutes versus 3 minutes). We took a rough stab in the dark, and started to examine whether or not it was the ZIL. Performing IO tests locally on the Thor shows no real IO problems, but running IO tests over NFS, specifically, with many smaller files we see a significant performance hit. Just to rule in or out the ZIL as a factor, we disabled it, and ran the test again. It completed in just under a minute, around 3 times faster than our existing storage. This was more like it! Are there any tunables for the ZIL to try to speed things up? Or would it be best to look into using a high-speed SSD for the log device? And, yes, I already know that turning off the ZIL is a Really Bad Idea. We do, however, need to provide our users with a certain level of performance, and what we've got with the ZIL on the pool is completely unacceptable. Thanks for any pointers you may have... -- Greg Mason Systems Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Another option to look at is: set zfs:zfs_nocacheflush=1 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide Best option is to get a a fast ZIL log device. Depends on your pool as well. NFS+ZFS means zfs will wait for write completes before responding to a sync NFS write ops. If you have a RAIDZ array, writes will be slower than a RAID10 style pool. On Tue, Jan 20, 2009 at 11:08 AM, Greg Mason gma...@msu.edu wrote: We're running into a performance problem with ZFS over NFS. When working with many small files (i.e. unpacking a tar file with source code), a Thor (over NFS) is about 4 times slower than our aging existing storage solution, which isn't exactly speedy to begin with (17 minutes versus 3 minutes). We took a rough stab in the dark, and started to examine whether or not it was the ZIL. Performing IO tests locally on the Thor shows no real IO problems, but running IO tests over NFS, specifically, with many smaller files we see a significant performance hit. Just to rule in or out the ZIL as a factor, we disabled it, and ran the test again. It completed in just under a minute, around 3 times faster than our existing storage. This was more like it! Are there any tunables for the ZIL to try to speed things up? Or would it be best to look into using a high-speed SSD for the log device? And, yes, I already know that turning off the ZIL is a Really Bad Idea. We do, however, need to provide our users with a certain level of performance, and what we've got with the ZIL on the pool is completely unacceptable. Thanks for any pointers you may have... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Greg Mason wrote: We're running into a performance problem with ZFS over NFS. When working with many small files (i.e. unpacking a tar file with source code), a Thor (over NFS) is about 4 times slower than our aging existing storage solution, which isn't exactly speedy to begin with (17 minutes versus 3 minutes). We took a rough stab in the dark, and started to examine whether or not it was the ZIL. It is. I've recently added some clarification to this section in the Evil Tuning Guide which might help you to arrive at a better solution. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 Feedback is welcome. -- richard Performing IO tests locally on the Thor shows no real IO problems, but running IO tests over NFS, specifically, with many smaller files we see a significant performance hit. Just to rule in or out the ZIL as a factor, we disabled it, and ran the test again. It completed in just under a minute, around 3 times faster than our existing storage. This was more like it! Are there any tunables for the ZIL to try to speed things up? Or would it be best to look into using a high-speed SSD for the log device? And, yes, I already know that turning off the ZIL is a Really Bad Idea. We do, however, need to provide our users with a certain level of performance, and what we've got with the ZIL on the pool is completely unacceptable. Thanks for any pointers you may have... -- Greg Mason Systems Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
So, what we're looking for is a way to improve performance, without disabling the ZIL, as it's my understanding that disabling the ZIL isn't exactly a safe thing to do. We're looking for the best way to improve performance, without sacrificing too much of the safety of the data. The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? -Greg On Jan 19, 2009, at 6:24 PM, Richard Elling wrote: We took a rough stab in the dark, and started to examine whether or not it was the ZIL. It is. I've recently added some clarification to this section in the Evil Tuning Guide which might help you to arrive at a better solution. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 Feedback is welcome. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Greg Mason wrote: So, what we're looking for is a way to improve performance, without disabling the ZIL, as it's my understanding that disabling the ZIL isn't exactly a safe thing to do. We're looking for the best way to improve performance, without sacrificing too much of the safety of the data. The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? Good idea. Thor has a CF slot, too, if you can find a high speed CF card. -- richard -Greg On Jan 19, 2009, at 6:24 PM, Richard Elling wrote: We took a rough stab in the dark, and started to examine whether or not it was the ZIL. It is. I've recently added some clarification to this section in the Evil Tuning Guide which might help you to arrive at a better solution. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29 Feedback is welcome. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
On Mon, 19 Jan 2009, Greg Mason wrote: The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? You need to add some sort of fast non-volatile cache. The Sun storage appliances are actually using battery backed DRAM for their write caches. This sort of hardware is quite rare. Fast SSD log devices are apparently pretty expensive. Some of the ones for sale are actually pretty slow. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
Good idea. Thor has a CF slot, too, if you can find a high speed CF card. -- richard We're already using the CF slot for the OS. We haven't really found any CF cards that would be fast enough anyways :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS, poor performance with many small files
On Mon, Jan 19 at 23:14, Greg Mason wrote: So, what we're looking for is a way to improve performance, without disabling the ZIL, as it's my understanding that disabling the ZIL isn't exactly a safe thing to do. We're looking for the best way to improve performance, without sacrificing too much of the safety of the data. The current solution we are considering is disabling the cache flushing (as per a previous response in this thread), and adding one or two SSD log devices, as this is similar to the Sun storage appliances based on the Thor. Thoughts? In general principles, the evil tuning guide states that the ZIL should be able to handle 10 seconds of expected synchronous write workload. To me, this implies that it's improving burst behavior, but potentially at the expense of sustained throughput, like would be measured in benchmarking type runs. If you have a big JBOD array with say 8+ mirror vdevs on multiple controllers, in theory, each VDEV can commit from 60-80MB/s to disk. Unless you are attaching a separate ZIL device that can match the aggregate throughput of that pool, wouldn't it just be better to have the default behavior of the ZIL contents being inside the pool itself? The best practices guide states that the max ZIL device size should be roughly 50% of main system memory, because that's approximately the most data that can be in-flight at any given instant. For a target throughput of X MB/sec and given that ZFS pushes transaction groups every 5 seconds (and have 2 outstanding), we also expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service 100MB/sec of synchronous writes, 1 GBytes of log device should be sufficient. But, no comments are made on the performance requirements of the ZIL device(s) relative to the main pool devices. Clicking around finds this entry: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on ...which appears to indicate cases where a significant number of ZILs were required to match the bandwidth of just throwing them in the pool itself. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS or NFS?
I have a build 62 system with a zone that NFS mounts an ZFS filesystem. From the zone, I keep seeing issues with .nfs files remaining in otherwise empty directories preventing their deletion. The files appear to be immediately replaced when they are deleted. Is this an NFS or a ZFS issue? Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or NFS?
Ian Collins wrote: I have a build 62 system with a zone that NFS mounts an ZFS filesystem. From the zone, I keep seeing issues with .nfs files remaining in otherwise empty directories preventing their deletion. The files appear to be immediately replaced when they are deleted. Is this an NFS or a ZFS issue? It is NFS that is doing that. It happens when a process on the NFS client still has the file open. fuser(1) is your friend here. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or NFS?
Ian Collins wrote: I have a build 62 system with a zone that NFS mounts an ZFS filesystem. From the zone, I keep seeing issues with .nfs files remaining in otherwise empty directories preventing their deletion. The files appear to be immediately replaced when they are deleted. Is this an NFS or a ZFS issue? This is the NFS client keeping unlinked but open files around. You need to find out what process has the files open (perhaps with fuser -c) and persuade them to close the files before you can unmount gracefully. You may also use umount -f if you don't care what happens to the processes. Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or NFS?
On 9/17/07, Darren J Moffat [EMAIL PROTECTED] wrote: It is NFS that is doing that. It happens when a process on the NFS client still has the file open. fuser(1) is your friend here. ... and if fuser doesn't tell you what you need to know, you can use lsof ( http://freshmeat.net/projects/lsof/ I usually just get it precompiled from http://www.sunfreeware.com/ ). I have found lsof to be more reliable that fuser in listing what has a file open. -- Paul Kraus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or NFS?
Ian Collins wrote: I have a build 62 system with a zone that NFS mounts an ZFS filesystem. From the zone, I keep seeing issues with .nfs files remaining in otherwise empty directories preventing their deletion. The files appear to be immediately replaced when they are deleted. Is this an NFS or a ZFS issue? That is how NFS deals with files that are unlinked while open. In a local file system, unlinked while open files will simply not be deleted until the close. For remote file systems, like NFS, you have to remove the file from the namespace, but not remove the file's content. The client will do this by creating .nfs files. A more detailed explanation is at: http://nfs.sourceforge.net/ -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and NFS Mounting - Missing Permissions
Hi I'm trying to setup a new NFS server, and wish to use Solaris and ZFS. I have a ZFS filesystem set up to handle the users home directories and setup sharing # zfs list NAME USED AVAIL REFER MOUNTPOINT data 896K 9.75G 35.3K /data data/home 751K 9.75G 38.0K /data/home data/home/bob 32.6K 9.75G 32.6K /data/home/bob data/home/joe 647K 9.37M 647K /data/home/joe data/home/paul32.6K 9.75G 32.6K /data/home/paul # zfs get sharenfs data/home NAME PROPERTY VALUE SOURCE data/homesharenfs rw local And these directories are owned by the user # ls -l /data/home total 12 drwxrwxr-x 2 bob sigma 2 Jul 23 08:47 bob drwxrwxr-x 2 joe sigma 4 Jul 23 11:31 joe drwxrwxr-x 2 paul sigma 2 Jul 23 08:47 paul I have the top level directory shared (/data/home). When I mount this on the client pc (ubuntu) I loose all the permissions, and can't see any of the files.. [EMAIL PROTECTED]:/nfs/home# ls -l total 6 drwxr-xr-x 2 root root 2 2007-07-23 08:47 bob drwxr-xr-x 2 root root 2 2007-07-23 08:47 joe drwxr-xr-x 2 root root 2 2007-07-23 08:47 paul [EMAIL PROTECTED]:/nfs/home# ls -l joe total 0 However, when I mount each directory manually, it works.. [EMAIL PROTECTED]:~# mount torit01sx:/data/home/joe /scott [EMAIL PROTECTED]:~# ls -l /scott total 613 -rwxrwxrwx 1 joe sigma 612352 2007-07-23 11:32 file Any ideas? When I try the same thing with a UFS based filesystem it works as expected [EMAIL PROTECTED]:/# mount torit01sx:/export/home /scott [EMAIL PROTECTED]:/# ls -l scott total 1 drwxr-xr-x 2 joe sigma 512 2007-07-23 12:25 joe Any help would be greatly appreciated.. Thanks in advance Scott This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and NFS Mounting - Missing Permissions
Scott Adair wrote: Hi I'm trying to setup a new NFS server, and wish to use Solaris and ZFS. I have a ZFS filesystem set up to handle the users home directories and setup sharing # zfs list NAME USED AVAIL REFER MOUNTPOINT data 896K 9.75G 35.3K /data data/home 751K 9.75G 38.0K /data/home data/home/bob 32.6K 9.75G 32.6K /data/home/bob data/home/joe 647K 9.37M 647K /data/home/joe data/home/paul32.6K 9.75G 32.6K /data/home/paul # zfs get sharenfs data/home NAME PROPERTY VALUE SOURCE data/homesharenfs rw local And these directories are owned by the user # ls -l /data/home total 12 drwxrwxr-x 2 bob sigma 2 Jul 23 08:47 bob drwxrwxr-x 2 joe sigma 4 Jul 23 11:31 joe drwxrwxr-x 2 paul sigma 2 Jul 23 08:47 paul I have the top level directory shared (/data/home). When I mount this on the client pc (ubuntu) I loose all the permissions, and can't see any of the files.. /data/home is a different file system than /data/home/joe. NFS shares do not cross file system boundaries. You'll need to share /data/home/joe, too. -- richard [EMAIL PROTECTED]:/nfs/home# ls -l total 6 drwxr-xr-x 2 root root 2 2007-07-23 08:47 bob drwxr-xr-x 2 root root 2 2007-07-23 08:47 joe drwxr-xr-x 2 root root 2 2007-07-23 08:47 paul [EMAIL PROTECTED]:/nfs/home# ls -l joe total 0 However, when I mount each directory manually, it works.. [EMAIL PROTECTED]:~# mount torit01sx:/data/home/joe /scott [EMAIL PROTECTED]:~# ls -l /scott total 613 -rwxrwxrwx 1 joe sigma 612352 2007-07-23 11:32 file Any ideas? When I try the same thing with a UFS based filesystem it works as expected [EMAIL PROTECTED]:/# mount torit01sx:/export/home /scott [EMAIL PROTECTED]:/# ls -l scott total 1 drwxr-xr-x 2 joe sigma 512 2007-07-23 12:25 joe Any help would be greatly appreciated.. Thanks in advance Scott This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS with NFS
Hi Has anyone any notes on how best configure ZFS pool for NFS mount to a 4-node RAC cluster. I am particularly interested in config options for zfs/zpool and NFS options at kernel level. The zpool is being presented from x4500 (thumper), and NFS presented to four nodes (x8400). There will be high io transactions being carried out on these filesystems. And one last thing Its a production env. Any pointers, gotcha, patches, helpful NFS kernel parameters would be appreciated. Thanks Mo ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with NFS
Mohammed Beik wrote: Hi Has anyone any notes on how best configure ZFS pool for NFS mount to a 4-node RAC cluster. I am particularly interested in config options for zfs/zpool and NFS options at kernel level. The zpool is being presented from x4500 (thumper), and NFS presented to four nodes (x8400). There will be high io transactions being carried out on these filesystems. And one last thing Its a production env. Any pointers, gotcha, patches, helpful NFS kernel parameters would be appreciated. Is Solaris Cluster being used with RAC? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS disables nfs/server on a host
Could it be an order problem? NFS trying to start before zfs is mounted? Just a guess, of course. I'm not real savvy in either realm. HTH, Mike Ben Miller wrote: I have an Ultra 10 client running Sol10 U3 that has a zfs pool set up on the extra space of the internal ide disk. There's just the one fs and it is shared with the sharenfs property. When this system reboots nfs/server ends up getting disabled and this is the error from the SMF logs: [ Apr 16 08:41:22 Executing start method (/lib/svc/method/nfs-server start) ] [ Apr 16 08:41:24 Method start exited with status 0 ] [ Apr 18 10:59:23 Executing start method (/lib/svc/method/nfs-server start) ] Assertion failed: pclose(fp) == 0, file ../common/libzfs_mount.c, line 380, function zfs_share If I re-enable nfs/server after the system is up it's fine. The system was recently upgraded to use zfs and this has happened on the last two reboots. We have lots of other systems that share nfs through zfs fine and I didn't see a similar problem on the list. Any ideas? Ben This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- http://www.sun.com/solaris * Michael Lee * Area System Support Engineer *Sun Microsystems, Inc.* Phone x40782 / 866 877 8350 Email [EMAIL PROTECTED] http://www.sun.com/solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS disables nfs/server on a host
I have seen a previous discussion with the same error. I don't think a solution was posted though. The libzfs_mount.c source indicates that the 'share' command returned non zero but specified no error. Can you run 'share' manually after a fresh boot? There may be some insight if it fails, though as you describe it, share should work without problems. -Robert http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libzfs /common/libzfs_mount.c?r=789 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ben Miller Sent: Thursday, April 19, 2007 9:05 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] ZFS disables nfs/server on a host I have an Ultra 10 client running Sol10 U3 that has a zfs pool set up on the extra space of the internal ide disk. There's just the one fs and it is shared with the sharenfs property. When this system reboots nfs/server ends up getting disabled and this is the error from the SMF logs: [ Apr 16 08:41:22 Executing start method (/lib/svc/method/nfs-server start) ] [ Apr 16 08:41:24 Method start exited with status 0 ] [ Apr 18 10:59:23 Executing start method (/lib/svc/method/nfs-server start) ] Assertion failed: pclose(fp) == 0, file ../common/libzfs_mount.c, line 380, function zfs_share If I re-enable nfs/server after the system is up it's fine. The system was recently upgraded to use zfs and this has happened on the last two reboots. We have lots of other systems that share nfs through zfs fine and I didn't see a similar problem on the list. Any ideas? Ben This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS vs NFS vs array caches, revisited
[EMAIL PROTECTED] said: The only obvious thing would be if the exported ZFS filesystems where initially mounted at a point in time when zil_disable was non-null. No changes have been made to zil_disable. It's 0 now, and we've never changed the setting. Export/import doesn't appear to change the behavior. [EMAIL PROTECTED] said: You might want to try in turn: dtrace -n 'sd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' dtrace -n 'sdioctl:[EMAIL PROTECTED](20)]=count()}' dtrace -n zil_flush_vdevs:[EMAIL PROTECTED](20)]=count()}' dtrace -n zil_commit_writer:[EMAIL PROTECTED](20)]=count()}' And see if you loose your footing along the way. I've included below the complete list of dtrace output. This system has two zpools, one that goes fast for NFS and one that goes slow. You can see the details of the pools' configs below. Let me re-state that at times in the past, the fast pool has gone slow, and I don't know what made it start going fast again. To summarize, the first dtrace above gives no output on the fast pool, and lists 6, 7, 12, or 14 calls for the slow pool. The second dtrace above counts 6 or 7 calls on both pools. The last third dtrace above gives no output for either pool, but zil_flush_vdevs isn't in the stack trace for the earlier trace on my machine (SPARC, Sol-10U3). The last dtrace doesn't find a matching probe here. = # echo zil_disable/D | mdb -k zil_disable: zil_disable:0 # zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT bulk_zp1 2.14T160K 2.14T 0% ONLINE - bulk_zp2 2.14T346K 2.14T 0% ONLINE - int01 48.2G 1.94G 46.3G 4% ONLINE - # cd # zpool export bulk_zp1 # zpool export bulk_zp2 # zpool import pool: bulk_zp2 id: 803252704584693135 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: bulk_zp2 ONLINE raidz1 ONLINE c6t4849544143484920443630303133323230303330d0s0 ONLINE c6t4849544143484920443630303133323230303330d0s1 ONLINE c6t4849544143484920443630303133323230303331d0s0 ONLINE c6t4849544143484920443630303133323230303331d0s1 ONLINE c6t4849544143484920443630303133323230303332d0s0 ONLINE c6t4849544143484920443630303133323230303332d0s1 ONLINE pool: bulk_zp1 id: 14914295292657419291 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: bulk_zp1 ONLINE raidz1 ONLINE c6t4849544143484920443630303133323230303230d0s0 ONLINE c6t4849544143484920443630303133323230303230d0s1 ONLINE c6t4849544143484920443630303133323230303231d0s0 ONLINE c6t4849544143484920443630303133323230303231d0s1 ONLINE c6t4849544143484920443630303133323230303232d0s0 ONLINE c6t4849544143484920443630303133323230303232d0s1 ONLINE c6t4849544143484920443630303133323230303232d0s2 ONLINE # zpool import bulk_zp1 # zpool import bulk_zp2 # zfs list bulk_zp1 NAME USED AVAIL REFER MOUNTPOINT bulk_zp1 123K 1.79T 53.6K /zp1 # zfs list bulk_zp2 NAME USED AVAIL REFER MOUNTPOINT bulk_zp2 193K 1.75T 63.9K /zp2 # dtrace -n 'ssd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' \ -n 'sd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' dtrace: description 'ssd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe dtrace: description 'sd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe ^C # : no output from zp1 test. # dtrace -n 'ssd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' \ -n 'sd_send_scsi_SYNCHRONIZE_CACHE:[EMAIL PROTECTED](20)]=count()}' dtrace: description 'ssd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe dtrace: description 'sd_send_scsi_SYNCHRONIZE_CACHE:entry' matched 1 probe ^C ssd`ssdioctl+0x17a8 zfs`vdev_disk_io_start+0xa0 zfs`zio_ioctl+0xec zfs`vdev_config_sync+0xe0 zfs`spa_sync+0x2ec zfs`txg_sync_thread+0x134 unix`thread_start+0x4 12 ssd`ssdioctl+0x17a8 zfs`vdev_disk_io_start+0xa0 zfs`zio_ioctl+0xec zfs`vdev_config_sync+0x258 zfs`spa_sync+0x2ec zfs`txg_sync_thread+0x134 unix`thread_start+0x4 12 # : above output from zp2 test. # dtrace -n 'ssdioctl:[EMAIL PROTECTED](20)]=count()}' -n 'sdioctl:[EMAIL PROTECTED](20)]=count()}' dtrace:
Re: [zfs-discuss] ZFS vs NFS vs array caches, revisited
[EMAIL PROTECTED] said: The reality is that ZFS turns on the write cache when it owns the whole disk. _Independantly_ of that, ZFS flushes the write cache when ZFS needs to insure that data reaches stable storage. The point is that the flushes occur whether or not ZFS turned the caches on or not (caches might be turned on by some other means outside the visibility of ZFS). Thanks for taking the time to clear this up for us (assuming others than just me had this misunderstanding :-). Yet today I measured something that leaves me puzzled again. How can we explain the following results? # zpool status -v pool: bulk_zp1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_zp1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s2 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s3 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s4 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s5 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s6 ONLINE 0 0 0 errors: No known data errors # prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0 * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400 34 613563821 613563854 1 400 613563855 613563821 1227127675 2 400 1227127676 613563821 1840691496 3 400 1840691497 613563821 2454255317 4 400 2454255318 613563821 3067819138 5 400 3067819139 613563821 3681382959 6 400 3681382960 613563821 4294946780 8 1100 4294946783 16384 4294963166 # And, at a later time: # zpool status -v bulk_sp1s pool: bulk_sp1s state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_sp1s ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s0 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s2 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s3 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s4 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s5 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s6 ONLINE 0 0 0 errors: No known data errors # The storage is that same single 2TB LUN I used yesterday, except I've used format to slice it up into 7 equal chunks, and made a raidz (and later a simple striped) pool across all of them. My tar over NFS benchmark on these goes pretty fast. If ZFS is making the flush-cache call, it sure works faster than in the whole-LUN case: ZFS on whole-disk FC-SATA LUN via NFS, yesterday: real 968.13 user 0.33 sys 0.04 7.9 KB/sec overall ZFS on whole-disk FC-SATA LUN via NFS, ssd_max_throttle=32 today: real 664.78 user 0.33 sys 0.04 11.4 KB/sec overall ZFS raidz on 7 slices of FC-SATA LUN via NFS today: real 12.32 user 0.32 sys 0.03 620.2 KB/sec overall ZFS striped on 7 slices of FC-SATA LUN via NFS today: real 6.51 user 0.32 sys 0.03 1178.3 KB/sec overall Not that I'm not complaining, mind you. I appear to have stumbled across a way to get NFS over ZFS to work at a reasonable speed, without making changes to the array (nor resorting to giving ZFS SVN soft partitions instead of real devices). Suboptimal, mind you, but it's workable if our Hitachi folks don't turn up a way to tweak the array. Guess I should go read the ZFS source code (though my 10U3 surely lags the Opensolaris stuff). Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS vs NFS vs array caches, revisited
I had followed with interest the turn off NV cache flushing thread, in regard to doing ZFS-backed NFS on our low-end Hitachi array: http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg05000.html In short, if you have non-volatile cache, you can configure the array to ignore the ZFS cache-flush requests. This is reported to improve the really terrible performance of ZFS-backed NFS systems. Feel free to correct me if I'm misremembering Anyway, I've also read that if ZFS notices it's using slices instead of whole disks, it will not enable/use the write cache. So I thought I'd be clever and configure a ZFS pool on our array with a slice of a LUN instead of the whole LUN, and fool ZFS into not issuing cache-flushes, rather than having to change config of the array itself. Unfortunately, it didn't make a bit of difference in my little NFS benchmark, namely extracting a small 7.6MB tar file (C++ source code, 500 files/dirs). I used three test zpools and a UFS filesystem (not all were in play at the same time): pool: bulk_sp1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_sp1 ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0 ONLINE 0 0 0 errors: No known data errors pool: bulk_sp1s state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM bulk_sp1s ONLINE 0 0 0 c6t4849544143484920443630303133323230303230d0s0 ONLINE 0 0 0 errors: No known data errors pool: int01 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM int01 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s5 ONLINE 0 0 0 c0t1d0s5 ONLINE 0 0 0 errors: No known data errors # prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0 * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400 34 4294879232 4294879265 1 400 4294879266 67517 4294946782 8 1100 4294946783 16384 4294963166 # Both NFS client and server are Sun T2000's, 16GB RAM, switched gigabit ethernet, Solaris-10U3 patched as of 12-Jan-2007, doing nothing else at the time of the tests. The bulk_sp1* pools were both on the same Hitachi 9520V RAID-5 SATA group that I ran my bonnie++ tests on yesterday. The int01 pool is mirrored on two slice-5's of the server T2000's internal 2.5 SAS 73GB drives. ZFS on whole-disk FC-SATA LUN via NFS: real 968.13 user 0.33 sys 0.04 7.9 KB/sec overall ZFS on partial slice-0 of FC-SATA LUN via NFS: real 950.77 user 0.33 sys 0.04 8.0 KB/sec overall ZFS on slice-5 mirror of internal SAS drives via NFS: real 17.48 user 0.32 sys 0.03 438.8 KB/sec overall UFS on partial slice-0 of FC-SATA LUN via NFS: real 6.13 user 0.32 sys 0.03 1251.4 KB/sec overall I'm not willing to disable the ZIL. I think I'd settle for the 400KB/sec range in this test from NFS on ZFS, if I could get that on our FC-SATA Hitachi array. As things are now, ZFS just won't work for us, and I'm not sure how to make it go faster. Thoughts suggestions are welcome Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS over NFS extra slow?
I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS filesystem. this takes: real5m12.423s user0m0.936s sys 0m4.760s Locally on the server (to the same ZFS filesystem) takes: real0m4.415s user0m1.884s sys 0m3.395s The same job over NFS to a UFS filesystem takes real1m22.725s user0m0.901s sys 0m4.479s Same job locally on server to same UFS filesystem: real0m10.150s user0m2.121s sys 0m4.953s This is easily reproducible even with single large files, but the multiple small files seems to illustrate some awful sync latency between each file. Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, but they don't seem relevant since the local ZFS performance is quite good. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS extra slow?
Hi Brad, I believe benr experienced the same/similar issue here: http://www.opensolaris.org/jive/message.jspa?messageID=77347 If it is the same, I believe its a known ZFS/NFS interaction bug, and has to do with small file creation. Best Regards, Jason On 1/2/07, Brad Plecs [EMAIL PROTECTED] wrote: I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS filesystem. this takes: real5m12.423s user0m0.936s sys 0m4.760s Locally on the server (to the same ZFS filesystem) takes: real0m4.415s user0m1.884s sys 0m3.395s The same job over NFS to a UFS filesystem takes real1m22.725s user0m0.901s sys 0m4.479s Same job locally on server to same UFS filesystem: real0m10.150s user0m2.121s sys 0m4.953s This is easily reproducible even with single large files, but the multiple small files seems to illustrate some awful sync latency between each file. Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, but they don't seem relevant since the local ZFS performance is quite good. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS extra slow?
Brad Plecs wrote: I had a user report extreme slowness on a ZFS filesystem mounted over NFS over the weekend. After some extensive testing, the extreme slowness appears to only occur when a ZFS filesystem is mounted over NFS. One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS filesystem. this takes: real5m12.423s user0m0.936s sys 0m4.760s Locally on the server (to the same ZFS filesystem) takes: real0m4.415s user0m1.884s sys 0m3.395s The same job over NFS to a UFS filesystem takes real1m22.725s user0m0.901s sys 0m4.479s Same job locally on server to same UFS filesystem: real0m10.150s user0m2.121s sys 0m4.953s This is easily reproducible even with single large files, but the multiple small files seems to illustrate some awful sync latency between each file. Any idea why ZFS over NFS is so bad? I saw the threads that talk about an fsync penalty, but they don't seem relevant since the local ZFS performance is quite good. Known issue, discussed here: http://www.opensolaris.org/jive/thread.jspa?threadID=14696tstart=15 benr. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS over NFS extra slow?
Another thing to keep an eye out for is disk caching. With ZFS, whenever the NFS server tells us to make sure something is on disk, we actually make sure it's on disk by asking the drive to flush dirty data in its write cache out to the media. Needless to say, this takes a while. With UFS, it isn't aware of the extra level of caching, and happily pretends it's in a world where once the drive ACKs a write, it's on stable storage. If you use format(1M) and take a look at whether or not the drive's write cache is enabled, that should shed some light on this. If it's on, try turning it off and re-run your NFS tests on ZFS vs. UFS. Either way, let us know what you find out. Slightly OT but you just reminded me of why I like disks that have Sun firmware on them. They never have write cache on. At least I have never seen it. Read cache yes but write cache never. At least in the Seagates and Fujitsus Ultra320 SCSI/FCAL disks that have a Sun logo on them. I have no idea what else that Sun firmware does on a SCSI disk but I'd love to know :-) Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss