Re: [zfs-discuss] Niagara and ZFS compression?
On our Niagara T2000 32x1000mhz box, 8 gigs ram, 4x68gig disk drives . we setup three drives as raidz with compression. all of our performance issues are gone. remember we receive ~150 million lines of ASCII and 2 million files per day . we have had zero performance issues since we 1) upgraded so solaris u03 2) added two more drives and setup a 3 disk raidz while doing this we also turned on compression for the heck of it. no performance differences we can measure. we are satisfied now with both the Niagara and ZFS. the web site is faster also http://canary.sfbay. try looking at a report where we must processs 100,000 lines of data. http://itsm-mpk-2.sfbay/canary/cgi-bin/canary.cgi?group=reportrpt_source=locallocal_rpt=process_counthour=11r1=3r2=1 This report shows every process running on every sun ray server in the world (approx 1 million pids). you can scroll down this report and find out which of the 1500 unique executables (oracle, vi,vim,emacs,mozilla,firefox,thunderbird,opera,.etc.) is causing the most load across the world. basically our performance issues have been solved. thanks ZFS and Niagara teams. sean Matthew Ahrens wrote: On Sun, Aug 20, 2006 at 06:16:23PM -0500, Mike Gerdts wrote: On 8/20/06, James Dickens [EMAIL PROTECTED] wrote: On 8/20/06, trevor pretty [EMAIL PROTECTED] wrote: Team During a ZFS presentation I had a question from Vernon which I could not answer and did not find with a quick look through the archives. Q: What's the effect (if any) of only having on Floating Point Processor on Niagara when you turn on ZFS compression? not an expert, but most if not all compression is integer based, and I don't think floating point is supported inside the kernel anyway so it has to be integer based. That's correct, we don't do any floating-point math in ZFS, either compression or checksumming. So Niagara's floating point performance will have no effect on ZFS performance. Not too long ago Roch said "compression runs in the context of a single thread per pool", which makes me worry much more about the speed of a single core doing all of the compression for a pool. This was not the design, we're working on fixing this bug so that many threads will be used to do the compression. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sean Meighan Mgr ITSM Engineering Sun Microsystems, Inc. US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email [EMAIL PROTECTED] NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Canary is now running latest code and has a 3 disk raidz ZFS volume
Hi George; life is better for us now. we upgraded to s10s_u3wos_01 last Friday on itsm-mpk-2.sfbay , the production Canary server http://canary.sfbay. What do we look like now? # zpool upgrade This system is currently running ZFS version 2. All pools are formatted using this version. we added two more lower performance disk drives last Friday. we went from two drives that were mirrored to four drives. now we look like this on our T2000: (1) 68 gig running unmirrored for the system (3) 68 gig drives setup as raidz # zpool status pool: canary state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM canary ONLINE 0 0 0 raidz ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 errors: No known data errors our 100% disk drive from previous weeks is now three drives. iostat now shows that no single drive is reaching 100% . here is a "iostat -xn 1 99" extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 4.0 0.0 136.0 0.0 0.0 0.0 0.0 5.3 0 2 c1t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 0.0 288.9 0.0 939.3 0.0 7.0 0.0 24.1 1 74 c1t1d0 0.0 300.9 0.0 940.8 0.0 6.2 0.0 20.7 1 72 c1t2d0 0.0 323.9 0.0 927.8 0.0 5.3 0.0 16.5 1 63 c1t3d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 itsm-mpk-2:vold(pid334) extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 0.0 70.9 0.0 118.8 0.0 0.5 0.0 7.6 0 28 c1t1d0 0.0 74.9 0.0 124.3 0.0 0.5 0.0 6.1 0 26 c1t2d0 0.0 75.8 0.0 120.3 0.0 0.5 0.0 7.2 0 27 c1t3d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 itsm-mpk-2:vold(pid Here is our old box # more /etc/release Solaris 10 6/06 s10s_u2wos_06 SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 30 March 2006 # pkginfo -l SUNWzfsr PKGINST: SUNWzfsr NAME: ZFS (Root) CATEGORY: system ARCH: sparc VERSION: 11.10.0,REV=2006.03.22.02.15 BASEDIR: / VENDOR: Sun Microsystems, Inc. DESC: ZFS root components PSTAMP: on10-patch20060322021857 INSTDATE: Apr 04 2006 13:52 HOTLINE: Please contact your local service provider STATUS: completely installed FILES: 18 installed pathnames 5 shared pathnames 7 directories 4 executables 1811 blocks used (approx) here is the current version # more /etc/release Solaris 10 11/06 s10s_u3wos_01 SPARC Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 27 June 2006 # pkginfo -l SUNWzfsr PKGINST: SUNWzfsr NAME: ZFS (Root) CATEGORY: system ARCH: sparc VERSION: 11.10.0,REV=2006.05.18.02.15 BASEDIR: / VENDOR: Sun Microsystems, Inc. DESC: ZFS root components PSTAMP: on10-patch20060315140831 INSTDATE: Jul 27 2006 12:10 HOTLINE: Please contact your local service provider STATUS: completely installed FILES: 18 installed pathnames 5 shared pathnames 7 directories 4 executables 1831 blocks used (approx) In my opinion the 2 1/2" disk drives in the Niagara box were not designed to receive one million files per day. these two extra drives (thanks Denis!) have given us acceptable performance. i still want a thumper *smile*. It is pretty amazing that we have 800 servers, 30,000 users, 140 million lines of ASCII per day all fitting in a 2u T2000 box! thanks sean George Wilson wrote: Sean, Sorry for the delay getting back to you. You can do a 'zpool upgrade' to see what version of the on-disk format you pool is currently running. The latest version is 3. You can then issue a 'zpool upgrade pool' to upgrade. Keep in mind that the upgrade is a one-way ticket and can't be rolled backwards. ZFS can be upgraded by just applying patches. So if you were running Solaris 10 06/06 (a.k.a u2) you could apply the patches that will come out when u3 ships. Then issue the 'zpool upgrade' command to get the functionality you need. Does this help? Can you send me the output of 'zpool upgrade' on your system? Thanks, George Sean Meighan wrote: Hi George; we are trying to build our server today. We should have the four disk drives mounted by this afternoon. Separate question; we were on an old ZFS version, how could we have upgraded to a new version? Do we really have to re-install Solaris to upgrade ZFS? thanks sean George Wilson wrote: Sean, The gate for s10u3_03 closed yesterday and I think the DVD image will be available early next week. I'll keep you posted. If you want to try this out before then what I can provide you are the binaries to run on top of s10u3_02. Thanks, George Sean Meighan wrote: George; is there a link to s1
Re: [zfs-discuss] How to best layout our filesystems
Hi Torrey; we are the cobblers kids. We borrowed this T2000 from Niagara engineering after we did some performance tests for them. I am trying to get a thumper to run this data set. This could take up to 3-4 months. Today we are watching 750 Sun Ray servers and 30,000 employees. Lets see 1) Solaris 10 2) ZFS version 6 3) T2000 32x1000 with the poorer performing drives that come with the Niagara We need a short term solution. Niagara engineering has given us two more of the internal drives so we can max out the Niagara with 4 internal drives. This is the hardware we need to use this week. . When we get a new box, more drives we will reconfigure. Our graphs have 5000 data points per month, 140 data points per day. we can stand to lose data. my suggestion was one drive as the system volume and the remaining three drives as one big zfs volume , probably raidz. thanks sean Torrey McMahon wrote: Given the amount of I/O wouldn't it make sense to get more drives involved or something that has cache on the front end or both? If you're really pushing the amount of I/O you're alluding too - Hard to tell without all the details - then you're probably going to hit a limitation on the drive IOPS. (Even with the cache on.) Karen Chau wrote: Our application Canary has approx 750 clients uploading to the server every 10 mins, that's approx 108,000 gzip tarballs per day writing to the /upload directory. The parser untars the tarball which consists of 8 ascii files into the /archives directory. /app is our application and tools (apache, tomcat, etc) directory. We also have batch jobs that run throughout the day, I would say we read 2 to 3 times more than we write. -- Sean Meighan Mgr ITSM Engineering Sun Microsystems, Inc. US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email [EMAIL PROTECTED] NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] half duplex read/write operations to disk sometimes?
1t0d0 4.0 727.0 168.0 1071.5 21.0 16.0 28.7 21.9 100 100 c1t0d0 297.0 79.0 18518.2 247.5 19.9 16.0 52.9 42.5 100 100 c1t0d0 25.0 613.0 1431.0 2088.0 22.4 16.0 35.0 25.1 100 100 c1t0d0 114.0 428.0 6756.1 1703.9 21.2 16.0 39.0 29.5 100 100 c1t0d0 207.9 217.9 12167.2 614.7 20.7 16.0 48.6 37.5 100 100 c1t0d0 363.2 11.0 21245.1 92.5 8.4 16.0 22.6 42.7 93 100 c1t0d0 351.0 7.0 21443.2 98.0 3.3 14.6 9.1 40.7 54 100 c1t0d0 332.7 5.0 20660.7 77.9 3.4 14.2 10.0 42.2 45 100 c1t0d0 354.3 3.0 22784.8 58.6 10.7 15.5 29.9 43.5 85 100 c1t0d0 349.0 4.0 21999.3 58.5 13.2 16.0 37.3 45.3 98 100 c1t0d0 353.0 3.0 22510.4 58.5 8.3 15.5 23.2 43.5 82 100 c1t0d0 344.9 0.0 21540.7 0.0 7.1 15.0 20.6 43.4 63 100 c1t0d0 386.0 0.0 22447.0 0.0 9.4 15.0 24.3 38.9 73 100 c1t0d0 373.1 0.0 20763.3 0.0 14.4 15.7 38.5 42.1 89 100 c1t0d0 364.9 0.0 23145.3 0.0 5.0 14.7 13.8 40.3 54 100 c1t0d0 363.8 0.0 22783.5 0.0 6.0 14.7 16.5 40.4 60 100 c1t0d0 357.3 0.0 22591.9 0.0 10.6 15.7 29.8 43.9 87 100 c1t0d0 369.6 0.0 22441.0 0.0 13.1 15.9 35.5 43.0 94 100 c1t0d0 344.4 1.0 22314.4 8.0 3.3 14.2 9.6 41.1 47 100 c1t0d0 344.0 0.0 22015.9 0.0 6.1 14.4 17.7 41.7 60 100 c1t0d0 372.0 0.0 22818.4 0.0 10.8 15.7 28.9 42.2 82 100 c1t0d0 376.0 2.0 21307.7 12.0 17.0 16.0 44.9 42.3 100 100 c1t0d0 372.1 0.0 23749.3 0.0 6.2 15.0 16.7 40.3 68 100 c1t0d0 347.9 0.0 22507.0 0.0 5.2 14.8 14.8 42.7 59 100 c1t0d0 357.0 0.0 22572.0 0.0 9.6 15.6 26.9 43.7 82 100 c1t0d0 365.0 0.0 21989.4 0.0 14.4 15.9 39.4 43.6 93 100 c1t0d0 355.0 0.0 22558.1 0.0 4.9 14.6 13.7 41.2 51 100 c1t0d0 website for the canary is located at http://canary.sfbay thanks sean -- Sean Meighan Mgr ITSM Engineering Sun Microsystems, Inc. US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email [EMAIL PROTECTED] NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] long time to schedule commands
i made sure path is clean, i also qualified the paths. time varies from 0.5 seconds to 15 seconds. If i just do a "timex pwd", it always seems to be fast. We are using csh. itsm-mpk-2% env HOME=/app/canary PATH=/usr/bin:/usr/local/bin:/usr/sbin LOGNAME=canary HZ=100 TERM=xterm TZ=US/Pacific SHELL=/bin/csh MAIL=/var/mail/canary DISPLAY=sr1-ubrm-20:55.0 PWD=/app/canary/data/incoming USER=canary JAVA_HOME=/usr/jdk/instances/jdk1.5.0 TOOLS=/app/tools LD_LIBRARY_PATH= VIM=/app/tools/vim70/share/vim/vim70 CVSROOT=:pserver:[EMAIL PROTECTED]:/export/cvs/cvsroot EDITOR=vi ENV=/app/canary/.kshrc LD_LIBRARY_PATH== PATH=/usr/bin:/usr/local/bin:/usr/ccs/bin:/usr/sbin:/usr/openwin/bin= itsm-mpk-2% /bin/timex /bin/truss -fdD -o truss.out pwd /upload/canary/incoming real 13.57 user 0.01 sys 0.03 itsm-mpk-2% tail truss.out 26078: 0.0541 0.0001 close(3) = 0 26078: 0.0553 0.0012 munmap(0xFF3A, 8192) = 0 26078: 0.0556 0.0003 mmap(0x0001, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3A 26078: 0.0560 0.0004 getcontext(0xFFBFF7F8) 26078: 0.0562 0.0002 getrlimit(RLIMIT_STACK, 0xFFBFF7D8) = 0 26078: 0.0563 0.0001 getpid() = 26078 [25982] 26078: 0.0565 0.0002 setustack(0xFF3A2088) 26078: 0.0568 0.0003 getcwd("/upload/canary/incoming", 1025) = 0 26078: 0.0573 0.0005 write(1, " / u p l o a d / c a n a".., 24) = 24 26078: 0.0576 0.0003 _exit(0) Michael Schuster - Sun Microsystems wrote: Sean Meighan wrote: I am not sure if this is ZFS, Niagara or something else issue? Does someone know why commands have the latency shown below? *1) do a ls of a directory. 6.9 seconds total, truss only shows .07 seconds.* [...] this may be an issue with your $PATH. Do you see the same behaviour if you use absolute paths for the commands? HTH Michael -- Sean Meighan Mgr ITSM Engineering Sun Microsystems, Inc. US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email [EMAIL PROTECTED] NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] two simple questions
1) We installed ZFS onto our Solaris 10 T2000 3 months ago. I have been told our ZFS code is downrev. What is the recommended way to upgrade ZFS on a production system (we want minimum downtime)? Can it safely be done without affecting our 3.5 million files? 2) We did not turn on compression as most of our 3+ million files are already gzipped. What is the performance penalty of having compression on (both read and write numbers)? Is there advantage to compressing already gzipped files? Should compression be the default when installing ZFS? Nearly all our files are ASCII. here is some info on our machine itsm-mpk-2% showrev Hostname: itsm-mpk-2 Hostid: 83d8d784 Release: 5.10 Kernel architecture: sun4v Application architecture: sparc Hardware provider: Sun_Microsystems Domain: Kernel version: SunOS 5.10 Generic_118833-08 T2000 32x1000mhz, 16gigs RAM. # zpool status pool: canary state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM canary ONLINE 0 0 0 c1t0d0s3 ONLINE 0 0 0 errors: No known data errors # zpool iostat 1 capacity operationsbandwidth pool used avail read write read write -- - - - - - - canary 42.0G 12.0G169223 8.92M 1.39M canary 42.0G 12.0G 0732 0 3.05M canary 42.0G 12.0G 0573 0 2.47M canary 42.0G 12.0G 0515 0 2.22M canary 42.0G 12.0G 0680 0 3.11M canary 42.0G 12.0G 0620 0 2.80M canary 42.0G 12.0G 0687 0 2.85M canary 42.0G 12.0G 0568 0 2.40M canary 42.0G 12.0G 0688 0 2.91M canary 42.0G 12.0G 0634 0 2.75M canary 42.0G 12.0G 0625 0 2.61M canary 42.0G 12.0G 0700 0 2.96M canary 42.0G 12.0G 0733 0 3.19M canary 42.0G 12.0G 0639 0 2.76M canary 42.0G 12.0G 1573 127K 2.89M canary 42.0G 12.0G 0652 0 2.48M canary 42.0G 12.0G 0713 63.4K 3.55M canary 42.0G 12.0G117355 7.83M 782K canary 42.0G 12.0G 43616 2.97M 1.11M canary 42.0G 12.0G128424 8.60M 1.57M canary 42.0G 12.0G288151 18.9M 795K canary 42.0G 12.0G364 0 23.9M 0 canary 42.0G 12.0G387 0 25.6M 0 thanks sean ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved
The vi we were doing was a 2 line file. If you just vi a new file, add one line and exit it would take 15 minutes in fdsynch. On recommendation of a workaround we set set zfs:zil_disable=1 after the reboot the fdsynch is now 0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system. thanks sean Roch wrote: 15 minutes to do a fdsync is way outside the slowdown usually seen. The footprint for 6413510 is that when a huge amount of data is being written non synchronously and a fsync comes in for the same filesystem then all the non-synchronous data is also forced out synchronously. So is there a lot of data being written during the vi? vi will write the whole file (in 4K) chunks and fsync it. (based on a single experiment). So for a largefile vi , on quit, we have lots of data to sync in and of itself. But because 6413510 we potentially have tosync lots ofother data written by other applications. Now take a Niagara with lots of available CPUs and lots of free memory (32GB maybe?) running some 'tar x' in parallel. A huge chunk of the 32GB can end up as dirty. I say too much so because of lack of throttling: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 6429205 each zpool needs to monitor it's throughput and throttle heavy writers Then vi :q; fsyncs; and all of the pending data must sync. So we have extra data to sync because of: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6413510 zfs: writing to ZFS filesystem slows down fsync() on other files in the same FS Furthermore, we can be slowed by this: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue parallel IOs... Note: 6440499 is now fixed in the gate. And finally all this data goes to a single disk. Worse a slice of a disk. Since it's just a slice ZFS can't enable the write cache. Then if there is no tag queue (is there ?) we will handle everything one I/O at a time. If it's a SATA drive we have other issues... I think we've hit is all here. So can this lead to 15 min fsync ? I can't swear, Actually I won't be convinced myself before I convince you, but we do have things to chew on already. Do I recall that this is about a1GB file in vi ? :wq-uitting out of a 1 GB vi session on a 50MB/sec disk will take 20sec when everything hums and there are no other traffic involved. With no write cache / no tag queue , maybe 10X more. -r -- Sean Meighan Mgr ITSM Engineering Sun Microsystems, Inc. US Phone x32329 / +1 408 850-9537 Mobile 303-520-2024 Fax 408 850-9537 Email [EMAIL PROTECTED] NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss