Thanks Marion!
 this is what I was thinking, look at my global zone prstat -Z :
 ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE     7      855 8697M 
6877M    28%  16:03:15 2.6% cloudserver     8      292 2978M 1766M   7.2%   
0:20:09 0.4% www.sonicle.com     0       92 1129M  477M   1.9% 110:18:26 0.1% 
global     1       23   41M   31M   0.1%  11:55:15 0.0% asterisk     5       26 
 524M  460M   1.9%  11:23:06 0.0% pkgserver     3      434 2212M 1231M   5.0%  
39:04:23 0.0% encoserver     2      138 1402M  774M   3.2%   4:04:16 0.0% 
demo.sonicle.com
 cloudserver and www.sonicle.com are the said zones.
 I'm not sure what the SWAP column is saying, is cloudserver using all that 
swap?
 There's no clue about swaps inside the zones, swap -l says:
 sonicle@cloudserver:~$ swap -l
 swapfile             dev    swaplo   blocks     free
 /dev/swap             -          8 25149432 24050136
 but maybe it's not real...
 
----------------------------------------------------------------------------------
 Da:  Marion Hakanson
  disc...@lists.illumos.org
 Data: 25 novembre 2015 20.06.54 CET
 Oggetto: Re: [discuss] deadlocks
 Hi Gabriele,
 The behavior you describe could be caused by saturation of any of
 the resources on the system, not only CPU/load.  If all of RAM were
 used up, for example, a new SSH session would pause until processes
 were swapped/paged out enough to give room for a new SSH to run
 in memory.  Similar results could happen if network or disk resources
 were saturated.
 Have a look at the USE method:
 http://www.brendangregg.com/usemethod.html
 On illumos/Solaris-based systems, you can use these commands to start with:
 prstat 1  (for CPU)
 vmstat 1  (for memory)
 iostat -xn 1 (for disk)
 dladm show-link -s -i 1
 (for network)
 Also check /var/adm/messages for errors being logged at or near the
 time of the issue.
 Regards,
 Marion
 Date: Wed, 25 Nov 2015 19:39:38 +0100
 From: Gabriele Bulfon
 To:
 Subject: [discuss] deadlocks
 Hi,
 I'm looking for help to find a solution to strange slow downs on a long living 
XStream/illumos server.
 This server runs 5-6 zones, on intel 8 cores, 24GB ram, separate boot on sata 
mirror rpool, and data on sas raidz pool.
 Two of these zones run essentially the same software: apache, tomcat, cyrus, 
postfix, amavis, postgres
 Apache front ends http to tomcat, running our collab webapps, working all the 
day on postfix smtp, cyrus imap and postgres db.
 1st zone is our own dev machine, running 4-5 users actually on all the stack.
 2nd zone is our customers machine, running around 1000 users on all the stack, 
separated into about 10 cyrus domains
 and their separated 10 instances of both webapps and databases.
 Recently, it happens from time to time (1-2 times a week) that everything 
starts to slow down.
 Stopping one or the other zone's tomcat/apache gets everything back: somtimes 
it's ours, sometimes it's the cloud.
 Ok, at first sight one would say: your web app has problems.
 But....then why do I have hard times connecting via ssh to the zones during 
this situations? Login takes minutes,
 password to shell another lots of minutes, but prstat/top don't show any cpu 
high usage on global zone, nor inside the zones.
 Then I stop one tomcat (sometimes one, sometimes the other), and verything 
gets free.
 Imap processes during these times are around 1000 in one machine, around 100 
on the other.
 Then they abruptly gets down, obvioiusly the web app closes connections.
 So my question is....how can I dig this problem?
 I would think that if the webapp is the problem, iniside java/tomcat, I should 
not experience problem during ssh.
 Any possible limits on socket? Any other idea?
 Thanks....
 Gabriele



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to