Re: [zfs-discuss] Niagara and ZFS compression?

2006-08-20 Thread Sean Meighan




On our Niagara T2000 32x1000mhz box, 8 gigs ram, 4x68gig disk drives .
we setup three drives as raidz with compression. all of our performance
issues are gone. remember we receive ~150 million lines of ASCII and 2
million files per day . we have had zero performance issues since we 
1) upgraded so solaris u03
2) added two more drives and setup a 3 disk raidz

while doing this we also turned on compression for the heck of it. no
performance differences we can measure. 

we are satisfied now with both the Niagara and ZFS.

the web site is faster also
http://canary.sfbay.

try looking at a report where we must processs 100,000 lines of data.
http://itsm-mpk-2.sfbay/canary/cgi-bin/canary.cgi?group=reportrpt_source=locallocal_rpt=process_counthour=11r1=3r2=1

This report shows every process running on every sun ray server in the
world (approx 1 million pids). you can scroll down this report and find
out which of the 1500 unique executables (oracle,
vi,vim,emacs,mozilla,firefox,thunderbird,opera,.etc.) is causing the
most load across the world.

basically our performance issues have been solved. thanks ZFS and
Niagara teams.

sean


Matthew Ahrens wrote:

  On Sun, Aug 20, 2006 at 06:16:23PM -0500, Mike Gerdts wrote:
  
  
On 8/20/06, James Dickens [EMAIL PROTECTED] wrote:


  On 8/20/06, trevor pretty [EMAIL PROTECTED] wrote:
  
  
Team

During a ZFS presentation I had a question from Vernon which I could not
answer and did not find with a quick look through the archives.

Q: What's the effect (if any) of only having on Floating Point Processor
on Niagara when you turn on ZFS compression?


  
  not an expert, but most if not all compression is  integer based, and
I don't think floating point is supported inside the kernel anyway so
it has to be integer based.
  

  
  
That's correct, we don't do any floating-point math in ZFS, either
compression or checksumming.  So Niagara's floating point performance
will have no effect on ZFS performance.

  
  
Not too long ago Roch said "compression runs in the context of a
single thread per pool", which makes me worry much more about the
speed of a single core doing all of the compression for a pool.

  
  
This was not the design, we're working on fixing this bug so that many
threads will be used to do the compression.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


-- 

  

  
   Sean Meighan 
Mgr ITSM Engineering
  
  Sun Microsystems, Inc.
US
Phone x32329 / +1 408 850-9537
Mobile 303-520-2024
Fax 408 850-9537
Email [EMAIL PROTECTED]
  
  

  


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Canary is now running latest code and has a 3 disk raidz ZFS volume

2006-07-30 Thread Sean Meighan




Hi George; life is better for us now.

we upgraded to s10s_u3wos_01 last Friday
on itsm-mpk-2.sfbay , the production Canary server http://canary.sfbay.
What do we look like now?


# zpool upgrade
This system is currently running ZFS version 2.

All pools are formatted using this version.

we added two more lower performance disk drives last Friday. we went
from two drives that were mirrored to four drives. now we look like
this on our T2000:
(1) 68 gig running unmirrored for the system
(3) 68 gig drives setup as raidz

# zpool status
 pool: canary
state: ONLINE
scrub: none requested
config:

 NAME STATE READ WRITE CKSUM
 canary ONLINE 0 0 0
 raidz ONLINE 0 0 0
 c1t1d0 ONLINE 0 0 0
 c1t2d0 ONLINE 0 0 0
 c1t3d0 ONLINE 0 0 0

errors: No known data errors

our 100% disk drive from previous weeks is now three drives. iostat
now shows that no single drive is reaching 100% . here is a "iostat -xn
1 99"

 extended device statistics
 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
 4.0 0.0 136.0 0.0 0.0 0.0 0.0 5.3 0 2 c1t0d0
 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
 0.0 288.9 0.0 939.3 0.0 7.0 0.0 24.1 1 74 c1t1d0
 0.0 300.9 0.0 940.8 0.0 6.2 0.0 20.7 1 72 c1t2d0
 0.0 323.9 0.0 927.8 0.0 5.3 0.0 16.5 1 63 c1t3d0
 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
itsm-mpk-2:vold(pid334)
 extended device statistics
 r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0
 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
 0.0 70.9 0.0 118.8 0.0 0.5 0.0 7.6 0 28 c1t1d0
 0.0 74.9 0.0 124.3 0.0 0.5 0.0 6.1 0 26 c1t2d0
 0.0 75.8 0.0 120.3 0.0 0.5 0.0 7.2 0 27 c1t3d0
 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
itsm-mpk-2:vold(pid


Here is our old box
# more /etc/release
   Solaris 10 6/06 s10s_u2wos_06 SPARC
   Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
 Assembled 30 March 2006

# pkginfo -l SUNWzfsr
   PKGINST:  SUNWzfsr
  NAME:  ZFS (Root)
  CATEGORY:  system
  ARCH:  sparc
   VERSION:  11.10.0,REV=2006.03.22.02.15
   BASEDIR:  /
VENDOR:  Sun Microsystems, Inc.
  DESC:  ZFS root components
PSTAMP:  on10-patch20060322021857
  INSTDATE:  Apr 04 2006 13:52
   HOTLINE:  Please contact your local service provider
STATUS:  completely installed
 FILES:   18 installed pathnames
   5 shared pathnames
   7 directories
   4 executables
1811 blocks used (approx)
here is the current version
# more /etc/release
 Solaris 10 11/06 s10s_u3wos_01 SPARC
 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
 Use is subject to license terms.
 Assembled 27 June 2006

# pkginfo -l SUNWzfsr
 PKGINST: SUNWzfsr
 NAME: ZFS (Root)
 CATEGORY: system
 ARCH: sparc
 VERSION: 11.10.0,REV=2006.05.18.02.15
 BASEDIR: /
 VENDOR: Sun Microsystems, Inc.
 DESC: ZFS root components
 PSTAMP: on10-patch20060315140831
 INSTDATE: Jul 27 2006 12:10
 HOTLINE: Please contact your local service provider
 STATUS: completely installed
 FILES: 18 installed pathnames
 5 shared pathnames
 7 directories
 4 executables
 1831 blocks used (approx)


In my opinion the 2 1/2" disk drives in the Niagara box were not
designed to receive one million files per day. these two extra drives
(thanks Denis!) have given us acceptable performance. i still want a
thumper *smile*. It is pretty amazing that we have 800 servers, 30,000
users, 140 million lines of ASCII per day all fitting in a 2u T2000
box!

thanks

sean


George Wilson wrote:
Sean,
  
  
Sorry for the delay getting back to you.
  
  
You can do a 'zpool upgrade' to see what version of the on-disk format
you pool is currently running. The latest version is 3. You can then
issue a 'zpool upgrade pool' to upgrade. Keep in mind that the
upgrade is a one-way ticket and can't be rolled backwards.
  
  
ZFS can be upgraded by just applying patches. So if you were running
Solaris 10 06/06 (a.k.a u2) you could apply the patches that will come
out when u3 ships. Then issue the 'zpool upgrade' command to get the
functionality you need.
  
  
Does this help? Can you send me the output of 'zpool upgrade' on your
system?
  
  
Thanks,
  
George
  
  
Sean Meighan wrote:
  
  Hi George; we are trying to build our server
today. We should have the four disk drives mounted by this afternoon.


Separate question; we were on an old ZFS version, how could we have
upgraded to a new version? Do we really have to re-install Solaris to
upgrade ZFS?


thanks

sean


George Wilson wrote:

Sean,
  
  
The gate for s10u3_03 closed yesterday and I think the DVD image will
be available early next week. I'll keep you posted. If you want to try
this out before then what I can provide you are the binaries to run on
top of s10u3_02.
  
  
Thanks,
  
George
      
  
Sean Meighan wrote:
  
  George; is there a link to s1

Re: [zfs-discuss] How to best layout our filesystems

2006-07-25 Thread Sean Meighan




Hi Torrey; we are the cobblers kids. We borrowed this T2000 from
Niagara engineering after we did some performance tests for them. I am
trying to get a thumper to run this data set. This could take up to 3-4
months. Today we are watching 750 Sun Ray servers and 30,000 employees.
Lets see
1) Solaris 10
2) ZFS version 6
3) T2000 32x1000 with the poorer performing drives that come with the
Niagara

We need a short term solution. Niagara engineering has given us two
more of the internal drives so we can max out the Niagara with 4
internal drives. This is the hardware we need to use this week. . When
we get a new box, more drives we will reconfigure.

Our graphs have 5000 data points per month, 140 data points per day. we
can stand to lose data.

my suggestion was one drive as the system volume and the remaining
three drives as one big zfs volume , probably raidz.

thanks
sean


Torrey McMahon wrote:
Given the
amount of I/O wouldn't it make sense to get more drives involved or
something that has cache on the front end or both? If you're really
pushing the amount of I/O you're alluding too - Hard to tell without
all the details - then you're probably going to hit a limitation on the
drive IOPS. (Even with the cache on.)
  
  
Karen Chau wrote:
  
  Our application Canary has approx 750 clients
uploading to the server

every 10 mins, that's approx 108,000 gzip tarballs per day writing to

the /upload directory. The parser untars the tarball which consists of

8 ascii files into the /archives directory. /app is our application
and

tools (apache, tomcat, etc) directory. We also have batch jobs that
run

throughout the day, I would say we read 2 to 3 times more than we
write.


 
  


-- 

  

  
   Sean Meighan 
Mgr ITSM Engineering
  
  Sun Microsystems, Inc.
US
Phone x32329 / +1 408 850-9537
Mobile 303-520-2024
Fax 408 850-9537
Email [EMAIL PROTECTED]
  
  

  


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] half duplex read/write operations to disk sometimes?

2006-07-15 Thread Sean Meighan
1t0d0
 4.0 727.0 168.0 1071.5 21.0 16.0 28.7 21.9 100 100 c1t0d0
 297.0 79.0 18518.2 247.5 19.9 16.0 52.9 42.5 100 100 c1t0d0
 25.0 613.0 1431.0 2088.0 22.4 16.0 35.0 25.1 100 100 c1t0d0
 114.0 428.0 6756.1 1703.9 21.2 16.0 39.0 29.5 100 100 c1t0d0
 207.9 217.9 12167.2 614.7 20.7 16.0 48.6 37.5 100 100 c1t0d0
 363.2 11.0 21245.1 92.5 8.4 16.0 22.6 42.7 93 100 c1t0d0
 351.0 7.0 21443.2 98.0 3.3 14.6 9.1 40.7 54 100 c1t0d0
 332.7 5.0 20660.7 77.9 3.4 14.2 10.0 42.2 45 100 c1t0d0
 354.3 3.0 22784.8 58.6 10.7 15.5 29.9 43.5 85 100 c1t0d0
 349.0 4.0 21999.3 58.5 13.2 16.0 37.3 45.3 98 100 c1t0d0
 353.0 3.0 22510.4 58.5 8.3 15.5 23.2 43.5 82 100 c1t0d0
 344.9 0.0 21540.7 0.0 7.1 15.0 20.6 43.4 63 100 c1t0d0
 386.0 0.0 22447.0 0.0 9.4 15.0 24.3 38.9 73 100 c1t0d0
 373.1 0.0 20763.3 0.0 14.4 15.7 38.5 42.1 89 100 c1t0d0
 364.9 0.0 23145.3 0.0 5.0 14.7 13.8 40.3 54 100 c1t0d0
 363.8 0.0 22783.5 0.0 6.0 14.7 16.5 40.4 60 100 c1t0d0
 357.3 0.0 22591.9 0.0 10.6 15.7 29.8 43.9 87 100 c1t0d0
 369.6 0.0 22441.0 0.0 13.1 15.9 35.5 43.0 94 100 c1t0d0
 344.4 1.0 22314.4 8.0 3.3 14.2 9.6 41.1 47 100 c1t0d0
 344.0 0.0 22015.9 0.0 6.1 14.4 17.7 41.7 60 100 c1t0d0
 372.0 0.0 22818.4 0.0 10.8 15.7 28.9 42.2 82 100 c1t0d0
 376.0 2.0 21307.7 12.0 17.0 16.0 44.9 42.3 100 100 c1t0d0
 372.1 0.0 23749.3 0.0 6.2 15.0 16.7 40.3 68 100 c1t0d0
 347.9 0.0 22507.0 0.0 5.2 14.8 14.8 42.7 59 100 c1t0d0
 357.0 0.0 22572.0 0.0 9.6 15.6 26.9 43.7 82 100 c1t0d0
 365.0 0.0 21989.4 0.0 14.4 15.9 39.4 43.6 93 100 c1t0d0
 355.0 0.0 22558.1 0.0 4.9 14.6 13.7 41.2 51 100 c1t0d0


website for the canary is located at http://canary.sfbay

thanks
sean


-- 

  

  
   Sean Meighan 
Mgr ITSM Engineering
  
  Sun Microsystems, Inc.
US
Phone x32329 / +1 408 850-9537
Mobile 303-520-2024
Fax 408 850-9537
Email [EMAIL PROTECTED]
  
  

  


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] long time to schedule commands

2006-07-11 Thread Sean Meighan




i made sure path is clean, i also qualified the paths. time varies from
0.5 seconds to 15 seconds. If i just do a "timex pwd", it always seems
to be fast. We are using csh.

itsm-mpk-2% env
HOME=/app/canary
PATH=/usr/bin:/usr/local/bin:/usr/sbin
LOGNAME=canary
HZ=100
TERM=xterm
TZ=US/Pacific
SHELL=/bin/csh
MAIL=/var/mail/canary
DISPLAY=sr1-ubrm-20:55.0
PWD=/app/canary/data/incoming
USER=canary
JAVA_HOME=/usr/jdk/instances/jdk1.5.0
TOOLS=/app/tools
LD_LIBRARY_PATH=
VIM=/app/tools/vim70/share/vim/vim70
CVSROOT=:pserver:[EMAIL PROTECTED]:/export/cvs/cvsroot
EDITOR=vi
ENV=/app/canary/.kshrc
LD_LIBRARY_PATH==
PATH=/usr/bin:/usr/local/bin:/usr/ccs/bin:/usr/sbin:/usr/openwin/bin=
itsm-mpk-2% /bin/timex /bin/truss -fdD -o truss.out pwd
/upload/canary/incoming

real 13.57
user 0.01
sys 0.03
itsm-mpk-2% tail truss.out
26078: 0.0541 0.0001 close(3)
= 0
26078: 0.0553 0.0012 munmap(0xFF3A, 8192)
= 0
26078: 0.0556 0.0003 mmap(0x0001, 24576,
PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0)
= 0xFF3A
26078: 0.0560 0.0004 getcontext(0xFFBFF7F8)
26078: 0.0562 0.0002 getrlimit(RLIMIT_STACK, 0xFFBFF7D8)
= 0
26078: 0.0563 0.0001 getpid()
= 26078 [25982]
26078: 0.0565 0.0002 setustack(0xFF3A2088)
26078: 0.0568 0.0003 getcwd("/upload/canary/incoming", 1025)
= 0
26078: 0.0573 0.0005 write(1, " / u p l o a d / c a n a".., 24)
= 24
26078: 0.0576 0.0003 _exit(0)


Michael Schuster - Sun Microsystems wrote:
Sean Meighan
wrote:
  
  I am not sure if this is ZFS, Niagara or
something else issue? Does someone know why commands have the latency
shown below?


*1) do a ls of a directory. 6.9 seconds total, truss only shows .07
seconds.*

  
  
[...]
  
  
this may be an issue with your $PATH. Do you see the same behaviour if
you use absolute paths for the commands?
  
  
HTH
  
Michael
  


-- 

  
    
  
   Sean Meighan 
Mgr ITSM Engineering
  
  Sun Microsystems, Inc.
US
Phone x32329 / +1 408 850-9537
Mobile 303-520-2024
Fax 408 850-9537
Email [EMAIL PROTECTED]
  
  

  


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] two simple questions

2006-06-29 Thread Sean Meighan
1) We installed ZFS onto our Solaris 10 T2000 3 months ago. I have been 
told our ZFS code is downrev. What is the recommended way to upgrade ZFS 
on a production system (we want minimum downtime)? Can it safely be 
done  without affecting our 3.5 million files?


2) We did not turn on compression as most of our 3+ million files are 
already gzipped. What is the performance penalty of having compression 
on (both read and write numbers)? Is there advantage to compressing 
already gzipped files? Should compression be the default when installing 
ZFS? Nearly all our files are ASCII.


here is some info on our machine

itsm-mpk-2% showrev
Hostname: itsm-mpk-2
Hostid: 83d8d784
Release: 5.10
Kernel architecture: sun4v
Application architecture: sparc
Hardware provider: Sun_Microsystems
Domain:
Kernel version: SunOS 5.10 Generic_118833-08

T2000 32x1000mhz, 16gigs RAM.

# zpool status
 pool: canary
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
   canary  ONLINE   0 0 0
 c1t0d0s3  ONLINE   0 0 0

errors: No known data errors
# zpool iostat 1
  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
canary  42.0G  12.0G169223  8.92M  1.39M
canary  42.0G  12.0G  0732  0  3.05M
canary  42.0G  12.0G  0573  0  2.47M
canary  42.0G  12.0G  0515  0  2.22M
canary  42.0G  12.0G  0680  0  3.11M
canary  42.0G  12.0G  0620  0  2.80M
canary  42.0G  12.0G  0687  0  2.85M
canary  42.0G  12.0G  0568  0  2.40M
canary  42.0G  12.0G  0688  0  2.91M
canary  42.0G  12.0G  0634  0  2.75M
canary  42.0G  12.0G  0625  0  2.61M
canary  42.0G  12.0G  0700  0  2.96M
canary  42.0G  12.0G  0733  0  3.19M
canary  42.0G  12.0G  0639  0  2.76M
canary  42.0G  12.0G  1573   127K  2.89M
canary  42.0G  12.0G  0652  0  2.48M
canary  42.0G  12.0G  0713  63.4K  3.55M
canary  42.0G  12.0G117355  7.83M   782K
canary  42.0G  12.0G 43616  2.97M  1.11M
canary  42.0G  12.0G128424  8.60M  1.57M
canary  42.0G  12.0G288151  18.9M   795K
canary  42.0G  12.0G364  0  23.9M  0
canary  42.0G  12.0G387  0  25.6M  0


thanks
sean
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 15 minute fdsync problem and ZFS: Solved

2006-06-21 Thread Sean Meighan




The vi we were doing was a 2 line file. If you just vi a new file, add
one line and exit it would take 15 minutes in fdsynch. On
recommendation of a workaround we set 
set zfs:zil_disable=1

after the reboot the fdsynch is now  0.1 seconds. Now I have no idea if it was this setting or the fact that we went through a reboot. Whatever the root cause we are now back to a well behaved file system.

thanks
sean



Roch wrote:

15 minutes to do a fdsync is way outside the slowdown usually seen.
  The footprint for 6413510 is that when a huge amount of
  data is being written non synchronously and a fsync comes in for the
  same filesystem then all the non-synchronous data is also forced out
  synchronously. So is there a lot of data being written during the vi?

vi will write the whole file (in 4K) chunks and fsync it.
(based on a single experiment).

So  for a largefile vi ,  on quit, we  have lots  of data to
sync in and of  itself.  But because 6413510  we potentially
have tosync lots  ofother  data  written  by   other
applications.

Now take a Niagara with lots of available CPUs and lots
of free memory (32GB maybe?) running some 'tar x' in
parallel. A huge chunk of the 32GB can end up as dirty.

I say too much so because of lack of throttling:

	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205
	6429205 each zpool needs to monitor it's  throughput and throttle heavy writers

Then vi :q; fsyncs; and all of the pending data must
sync. So we have extra data to sync because of:

	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6413510
	zfs: writing to ZFS filesystem slows down fsync() on other files in the same FS

Furthermore, we can be slowed by this:

	http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6440499
	zil should avoid txg_wait_synced() and use dmu_sync() to issue parallel IOs...

Note: 6440499 is now fixed in the gate.

And  finally  all this data goes  to  a single disk. Worse a
slice of a disk.  Since  it's just a  slice ZFS can't enable
the write cache. Then if there is no tag queue (is there ?) we
will handle everything one I/O at a time. If it's a SATA
drive we have other issues...

I think  we've hit is  all here. So can this  lead to 15 min
fsync ? I can't swear,  Actually I won't be convinced myself
before  I convince you,  but we  do have  things  to chew on
already.


Do  I recall that   this   is about a1GB  file in  vi  ?
:wq-uitting out of a 1 GB vi session on a 50MB/sec disk will
take  20sec  when everything  hums   and there  are no other
traffic involved. With no write cache / no tag queue , maybe
10X more.

-r

  


-- 

  

  
   Sean Meighan 
Mgr ITSM Engineering
  
  Sun Microsystems, Inc.
US
Phone x32329 / +1 408 850-9537
Mobile 303-520-2024
Fax 408 850-9537
Email [EMAIL PROTECTED]
  
  

  


NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss