RE: CLOSE_WAIT and what to do about it

2009-04-14 Thread Peter Crowther
 From: André Warnier [mailto:a...@ice-sa.com]
  public void close()
  throws SomeException
  {
  putEndRequest();
  flush();
  socket = null;
  }
 flush() being another function which reads the socket until there's
 nothing left to read, and throws away the result.
 socket is a property of the object created by this class, obtained
 somewhere else from a java.net.Socket object.
 Looking at that code above, it is obvious that socket is open, until
 it is set to null, without previously doing a socket.close().
 I don't know Java enough to know if this alone could cause that socket
 to be lingering until the GC, but I kind of suspect so.

Nice piece of detective work, André!  Yes, that code's broken - the socket's 
not referenced but not closed, so it will stay open until a GC tidies it up.

$deity only knows what the original developer was thinking when they wrote that.

- Peter

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: CLOSE_WAIT and what to do about it

2009-04-12 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: Re: CLOSE_WAIT and what to do about it
 
 If these sockets disappear during a GC, then it must mean that they are
 still being referenced by some abandoned objects sitting on the Heap,
 which have not yet been reclaimed by the GC.
 Which probably means that the objects in question have gone out of
 scope, before the socket they used was properly close()'d.

Your analysis looks reasonable to me.  There are some analysis tools that will 
examine a live heap (or dump thereof) and find the reachable and unreachable 
objects; jhat is a free one that comes with JDK 6:
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/tooldescr.html#gblfj

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: CLOSE_WAIT and what to do about it

2009-04-12 Thread André Warnier

Caldarale, Charles R wrote:

From: André Warnier [mailto:a...@ice-sa.com]
Subject: Re: CLOSE_WAIT and what to do about it

Relatedly, does there exist any way to force a given 
JVM process to do a full GC interactively, but from a

Linux command-line ?


Found a command line tool that will do what you want:
http://code.google.com/p/jmxsh/

I've used it to trigger a GC in Tomcat via the following steps.

1) Start Tomcat with the following options:
 -Dcom.sun.management.jmxremote.port=port
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dcom.sun.management.jmxremote.ssl=false
   (You can, of course, set the authentication and SSL options as needed.)

2) Start jmxsh from the directory its jar is in with this:
  java -jar jmxsh*.jar

3) Enter the following commands (but not the bracketed bits):
  jmx_connect -h localhost -p port
  [blank line to enter browse mode]
  5  [selects java.lang]
  1  [selects the Memory mbean]
  5  [performs a GC]

The doc for jmxsh indicates the above steps should be scriptable, but I haven't 
tried that.

It is likely that you could use jmx_connect with a different kind of service 
and avoid opening up an RMI port; if I figure that out, I'll let you know.



Hi.
Thanks a million for providing the above info.
That jmxsh program is really useful.
I don't really know what I'm doing here, but I can at least more or less 
figure out what happens.


To recall, my original issue is that I have some Java applications 
(among which a Tomcat webapp and a couple of stand-alone Java 
daemon-like programs) which apparently leave an ever-increasing number 
of sockets lingering in a CLOSE_WAIT state.
And I was wondering if it was possible, as one test, to force the JVM 
running these applications to perform a GC, right now, from the outside.

Well, it is.

Following is a trace of a session with jmxsh, with one of these 
applications.



Initial socket situation :

r...@arthur:/home/star/xml# netstat -pan | grep CLOSE
tcp6   0  0 :::127.0.0.1:48267  :::127.0.0.1:11002 
CLOSE_WAIT 7618/java
tcp6  12  0 :::127.0.0.1:36936  :::127.0.0.1:11002 
CLOSE_WAIT 7816/java
tcp6  12  0 :::127.0.0.1:50322  :::127.0.0.1:11002 
CLOSE_WAIT 7816/java


r...@arthur:/home/star/xml# ps -ef | grep 7618
root  7618 1  1 14:32 pts/300:00:15 ./java -server 
-Dcom.sun.management.jmxremote.port=11201 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false -Xms64M -Xmx64M 
-Dpgm=STARWeb -jar /home//web4/java/xyz.jar -c 
/home/star/web4/config -p 11101


The above is the process which I am going to stress, in the sense of 
communicating with it, which has the result of having it itself open a 
TCP connection with another server listening on port 11002, then closing 
this socket (in principle), and this multiple times.


(As you see, the program was started with the jmxremote options 
allowing later communication with jmxsh.)


Now some interactions with the application pid=7618 ...
Situation later on :

r...@arthur:/home/star/xml# netstat -pan | grep CLOSE
tcp6   0  0 :::127.0.0.1:55798  :::127.0.0.1:11002 
CLOSE_WAIT 7618/java
tcp6   0  0 :::127.0.0.1:57029  :::127.0.0.1:11002 
CLOSE_WAIT 7618/java
tcp6   0  0 :::127.0.0.1:48267  :::127.0.0.1:11002 
CLOSE_WAIT 7618/java
tcp6   0  0 :::127.0.0.1:56781  :::127.0.0.1:11002 
CLOSE_WAIT 7618/java
tcp6  12  0 :::127.0.0.1:36936  :::127.0.0.1:11002 
CLOSE_WAIT 7816/java
tcp6  12  0 :::127.0.0.1:58341  :::127.0.0.1:11002 
CLOSE_WAIT 7816/java
tcp6   0  0 :::127.0.0.1:32972  :::127.0.0.1:11002 
CLOSE_WAIT 7618/java
tcp6  12  0 :::127.0.0.1:50322  :::127.0.0.1:11002 
CLOSE_WAIT 7816/java


So this application indeed left a number of sockets in the CLOSE_WAIT state.

Now triggering a GC with jmxsh :

a...@arthur:~$ java -jar bin/jmxsh-R4.jar
jmxsh v1.0, Tue Jan 22 17:23:12 GMT+01:00 2008

Type 'help' for help.  Give the option '-?' to any command
for usage help.

Starting up in shell mode.
% jmx_connect -h localhost -p 11201
Connected to service:jmx:rmi:///jndi/rmi://localhost:11201/jmxrmi.

%
Entering browse mode.


 Available Domains:

   1. java.util.logging
   2. JMImplementation
   3. java.lang

  SERVER: service:jmx:rmi:///jndi/rmi://localhost:11201/jmxrmi


Select a domain: 3


 Available MBeans:

   1. java.lang:type=Compilation
   2. java.lang:type=MemoryManager,name=CodeCacheManager
   3. java.lang:type=GarbageCollector,name=Copy
   4. java.lang:type=MemoryPool,name=Eden Space
   5. java.lang:type=Runtime
   6. java.lang:type=ClassLoading
   7. java.lang:type=MemoryPool,name=Survivor Space
   8. java.lang:type

Re: CLOSE_WAIT and what to do about it

2009-04-12 Thread André Warnier

Caldarale, Charles R wrote:

From: André Warnier [mailto:a...@ice-sa.com]
Subject: Re: CLOSE_WAIT and what to do about it

If these sockets disappear during a GC, then it must mean that they are
still being referenced by some abandoned objects sitting on the Heap,
which have not yet been reclaimed by the GC.
Which probably means that the objects in question have gone out of
scope, before the socket they used was properly close()'d.


Your analysis looks reasonable to me.  There are some analysis tools that will 
examine a live heap (or dump thereof) and find the reachable and unreachable 
objects; jhat is a free one that comes with JDK 6:
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/tooldescr.html#gblfj


Allright, I have done that too.
I generated a Heap dump using
jmap -heap:format=b pid

That gave me file heap.bin of some 4.5 MB.
I then used the jhat program to open it.

jhat launches itself by default as a webserver on port 7000, which you 
can access using a normal browser.


That's where my problem starts however, because being a mere Java 
fiddler I don't really know what I am looking at, and what to look for. 
 I did a lot of guesswork anyway, and using my knowledge of the 
application more than the links, I came upon the name of a class that 
looks like it is reponsible for opening/closing the sockets that remain 
in CLOSE_WAIT.

I found the following function in the class :
public void close()
throws SomeException
{
putEndRequest();
flush();
socket = null;
}
flush() being another function which reads the socket until there's 
nothing left to read, and throws away the result.
socket is a property of the object created by this class, obtained 
somewhere else from a java.net.Socket object.
Looking at that code above, it is obvious that socket is open, until 
it is set to null, without previously doing a socket.close().
I don't know Java enough to know if this alone could cause that socket 
to be lingering until the GC, but I kind of suspect so.

How does a Java expert look at that ?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: CLOSE_WAIT and what to do about it

2009-04-12 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: Re: CLOSE_WAIT and what to do about it
 
 Looking at that code above, it is obvious that socket is open, until
 it is set to null, without previously doing a socket.close().
 I don't know Java enough to know if this alone could cause that socket
 to be lingering until the GC, but I kind of suspect so.

For not being that familiar with Java, you've done an admirable job of tracking 
this down.  What you've found certainly looks like the cause of the problem; 
the class you encountered appears to be a wrapper for a plain java.net.Socket, 
and whoever wrote it simply missed putting in a socket.close() call.  Perhaps 
this was originally developed on an older JVM with more frequent 
non-generational garbage collection, so the problem wasn't noticed then.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: CLOSE_WAIT and what to do about it

2009-04-12 Thread André Warnier

Caldarale, Charles R wrote:

From: André Warnier [mailto:a...@ice-sa.com]
Subject: Re: CLOSE_WAIT and what to do about it

Looking at that code above, it is obvious that socket is open, until
it is set to null, without previously doing a socket.close().
I don't know Java enough to know if this alone could cause that socket
to be lingering until the GC, but I kind of suspect so.


For not being that familiar with Java, you've done an admirable job of tracking 
this down.  What you've found certainly looks like the cause of the problem; 
the class you encountered appears to be a wrapper for a plain java.net.Socket, 
and whoever wrote it simply missed putting in a socket.close() call.  Perhaps 
this was originally developed on an older JVM with more frequent 
non-generational garbage collection, so the problem wasn't noticed then.


I was standing on the shoulders of giants.
Thanks for the help.

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: CLOSE_WAIT and what to do about it

2009-04-10 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: Re: CLOSE_WAIT and what to do about it
 
 Relatedly, does there exist any way to force a given 
 JVM process to do a full GC interactively, but from a
 Linux command-line ?

Found a command line tool that will do what you want:
http://code.google.com/p/jmxsh/

I've used it to trigger a GC in Tomcat via the following steps.

1) Start Tomcat with the following options:
 -Dcom.sun.management.jmxremote.port=port
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dcom.sun.management.jmxremote.ssl=false
   (You can, of course, set the authentication and SSL options as needed.)

2) Start jmxsh from the directory its jar is in with this:
  java -jar jmxsh*.jar

3) Enter the following commands (but not the bracketed bits):
  jmx_connect -h localhost -p port
  [blank line to enter browse mode]
  5  [selects java.lang]
  1  [selects the Memory mbean]
  5  [performs a GC]

The doc for jmxsh indicates the above steps should be scriptable, but I haven't 
tried that.

It is likely that you could use jmx_connect with a different kind of service 
and avoid opening up an RMI port; if I figure that out, I'll let you know.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: CLOSE_WAIT and what to do about it

2009-04-09 Thread Taylan Develioglu
Skimmed quickly through your post there while working, so forgive me if
this is irrelevant.

CLOSE_WAIT is a state where the connection has been closed on the tcp/ip
level, but the application (in this case java) has not closed the socket
descriptor yet.

As a coincidence we just fixed this very same issue in our application,
which uses the httpclient library.

There is a known issue with the httpclient library where sockets are not
closed after the connection ends (issue or feature you be the judge), 
we worked around this by explicitly calling a close ourselves.

If httpclient is used that could be the culprit.

See
http://www.nabble.com/tcp-connections-left-with-CLOSE_WAIT-td13757202.html
for a better description

Rgds,

Taylan

André Warnier wrote:


 Hi.
 As a follow-upon another thread originally entitled apache/tomcat
 communication issues (502 response), I'd like to pursue the
 CLOSE-WAIT subject.

 Sorry if this post is a bit long, I want to make sure that I do
 provide all the necessary information.

 Like the original poster, I am seeing on my systems a fair number of
 sockets apparently stuck for a long time in the CLOSE_WAIT state.
 (Sometimes several hundreds of them).
 They seem to predominantly concern Tomcat and other java processes,
 but as Alan pointed out previously and I confirm, my perspective is
 slanted, because we use a lot of common java programs and webapps on
 our servers, and the ones mostly affected talk to eachother and come
 from the same vendor.
 Unfortunately also, I do not have the sources of these
 programs/webapps available, and will not get them, and I can't do
 without these programs.

 It has been previously established that a socket in a
 long-time-lingering CLOSE-WAIT status, is due to one or the other side
 of a TCP connection not properly closing its side of the connection
 when it is done with it.
 I also surmise (without having a definite proof of this), that this is
 essentially bad, as it ties up some resources that could be
 otherwise freed.
 I have also been told or discovered that, our servers being Linux
 Debian servers, programs such as ps, netstat and lsof can help
 in determining precisely how many such lingering sockets there are,
 and who the culprit processes are (to some extent).

 In our case, we know which are the programs involved, because we know
 which ones open a listening socket and on what fixed port, and we also
 know which are the other processes talking to them.
 But, as mentioned previously, we do not have the source of these
 programs and will not get them, but cannot practically do without them
 for now. But we do have full root control of the Linux servers where
 these programs are running.

 So my question is : considering the situation above, is there
 something I can do locally to free these lingering CLOSE_WAIT sockets,
 and under which conditions ?
 (I must admit that I am a bit lost among the myriad options of lsof)

 For example, suppose I start with a netstat -pan command and I see
 the display below (sorry for the line-wrapping).
 I see a number of sockets in the CLOSE_WAIT state, and for those I
 have a process-id, which I can associate to a particular process.
 For example, I see this line :
 tcp6  12  0 :::127.0.0.1:41764  :::127.0.0.1:11002
 CLOSE_WAIT 29649/java
 which tells me that there is a local process 29649/java, whith a
 local socket port 41674 in the CLOSE_WAIT state, related to another
 socket #11002 on the same host.
 On the other hand, I see this line :
 tcp0  0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2  -
 which shows a local socket on port 11002, related to this other
 local socket port #41764, with no process-id/program displayed.
 What does that tell me ?

 I also know that the process-id 29649 corresponds to a local java
 process, of the daemon variety, multi-threaded.  That program talks
 to another known server program, written in C, of which instances are
 started on an ad-hoc base by inetd, and which listens on port 11002
 (in fact it is inetd who does, and it passes this socket on to the
 process it forks, I understand that).

 (The link with Tomcat is that I also see frequently the same
 situation, where the process owning the CLOSE_WAIT socket is Tomcat,
 more specifically one webapp running inside it.  It's just that in
 this particular snapshot it isn't.)

 What it looks like to me in this case, is that at some point one of
 the threads of process # 29649 opened a client socket #41674 to the
 local inetd port #11002; that inetd then started the underlying server
 process (the C program); that the underlying C program then at some
 point exited; but that process #41674 never closes one of the sides of
 its connection with port #11002.
 Can I somehow detect this condition, and force the offending thread
 of process #29649 to close that socket (or just force this thread to
 exit) ?

 I realise this may be a complex question, and that the answers may be
 

RE: CLOSE_WAIT and what to do about it

2009-04-08 Thread Peter Crowther
 From: André Warnier [mailto:a...@ice-sa.com]
 It has been previously established that a socket in a
 long-time-lingering CLOSE-WAIT status, is due to one or the other side
 of a TCP connection not properly closing its side of the
 connection when
 it is done with it.
 I also surmise (without having a definite proof of this), that this is
 essentially bad, as it ties up some resources that could be
 otherwise freed.

At the very least it'll tie up a kernel data structure for the socket itself.  
I don't know modern Linux kernels well enough to know how buffers are 
allocated, but I suspect you won't be wasting much memory on buffers as they'll 
be allocated on-demand.  You're probably talking tens to low hundreds of bytes 
for each one of these.

You will also be consuming resources in whichever program is not closing the 
sockets correctly.

 So my question is : considering the situation above, is there
 something
 I can do locally to free these lingering CLOSE_WAIT sockets, and under
 which conditions ?

 For example, I see this line :
 tcp6  12  0 :::127.0.0.1:41764  :::127.0.0.1:11002
 CLOSE_WAIT 29649/java
 which tells me that there is a local process 29649/java,
 whith a local
 socket port 41674 in the CLOSE_WAIT state, related to another socket
 #11002 on the same host.
 On the other hand, I see this line :
 tcp0  0 127.0.0.1:11002 127.0.0.1:41764
 FIN_WAIT2  -
 which shows a local socket on port 11002, related to this
 other local
 socket port #41764, with no process-id/program displayed.
 What does that tell me ?

The process that was on port 11002 closed its end of the socket and sent a FIN. 
 Process 29649 hasn't closed its end of the socket yet.

 I also know that the process-id 29649 corresponds to a local java
 process, of the daemon variety, multi-threaded.  That program
 talks to
 another known server program, written in C, of which instances are
 started on an ad-hoc base by inetd, and which listens on port 11002
 (in fact it is inetd who does, and it passes this socket on to the
 process it forks, I understand that).

The local Java process may have a resource leak.  It appears not to have closed 
the socket it was using to communicate with the server.

A possible reason for the lack of a PID on port 11002 is that the socket was 
handed across from inetd to the C daemon - not sure about this.

 What it looks like to me in this case, is that at some point
 one of the
 threads of process # 29649 opened a client socket #41674 to the local
 inetd port #11002; that inetd then started the underlying
 server process
 (the C program); that the underlying C program then at some point
 exited; but that process #41674 never closes one of the sides of its
 connection with port #11002.

Agree.

 Can I somehow detect this condition, and force the
 offending thread of
 process #29649 to close that socket (or just force this
 thread to exit) ?

Threads are flows of control.  Threads do not reference objects other than from 
their stack and any thread-local storage - and there are plenty of other places 
that can hold onto objects!  The socket may well be referenced from an object 
on the heap (not the stack) that's ultimately referenced by a static variable 
in a class, for example, in which case zapping a thread may well do nothing.

You need to find out what, if anything, is holding onto the socket.

If you have some way of forcing that Java process to collect garbage, you 
should do so.  It's possible for sockets that haven't been close()d to hang 
around, unreferenced but not yet garbage collected.  A full GC would collect 
any of these, finalizing them as it does and hence closing the socket.  If a 
full GC doesn't close the socket, some other object is still referencing it.

If a full GC doesn't clear the problem, you may need to go in with some 
memory-tracing tool and find out what's holding onto the socket.  It's a long, 
long time since I had to do this in Java, so I have no idea of the appropriate 
tools - my brain's telling me Son of Strike, which is for the .Net CLR and 
*definitely* wrong!

Does that help?  Or is it clear as mud?

- Peter

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: CLOSE_WAIT and what to do about it

2009-04-08 Thread André Warnier

Peter Crowther wrote:
[...]


Does that help?  Or is it clear as mud?


For no-java-expert-me, it is indeed of the hazy category.
But it helps a lot, in the sense of adding a +3 in the column get 
back to the vendor and ask them to fix their code.

;-)
Thanks.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: CLOSE_WAIT and what to do about it

2009-04-08 Thread André Warnier

Peter Crowther wrote:
[...]


If you have some way of forcing that Java process to collect garbage, you 
should do so.  It's possible for sockets that haven't been close()d to hang 
around, unreferenced but not yet garbage collected.  A full GC would collect 
any of these, finalizing them as it does and hence closing the socket.  If a 
full GC doesn't close the socket, some other object is still referencing it.

Hopping on that idea, and still considering the try something from the 
outside, without modifying the code kind of view :


This process is started as a daemon, with a java command-line.
Is it possible to add some arguments to that command-line to induce 
the JVM to do a GC more often ?
(I don't think that in this case it would have a very negative impact on 
performance.)
It currently starts without any -D switches at all to the command-line, 
basically :

path/to/java/java -jar theapp.jar

The same question for the related Tomcat webapp (which I suspect of 
having the same issue).  But in that case I do have to be a bit more 
careful regarding the performance impact, although this webapp is pretty 
much all that is running in this Tomcat.
And that Tomcat (on some of our systems) starts under jsvc, and I don't 
really know where to set the parameters for that one under Linux.



Relatedly, does there exist any way to force a given JVM process to do a 
full GC interactively, but from a Linux command-line ?
I have full access to these systems, but usually only in SSH console 
mode, and I don't know if there is any kind of graphical GUI installed 
or accessible on them.
Basically, I'd like to see if triggering a GC reduces this number of 
lingering sockets.




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: CLOSE_WAIT and what to do about it

2009-04-08 Thread Peter Crowther
 From: André Warnier [mailto:a...@ice-sa.com]
 This process is started as a daemon, with a java command-line.
 Is it possible to add some arguments to that command-line to induce
 the JVM to do a GC more often ?

http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html - I don't 
think so, although the RMI option under Explicit Garbage Collection might 
work.

 The same question for the related Tomcat webapp (which I suspect of
 having the same issue).  But in that case I do have to be a bit more
 careful regarding the performance impact, although this
 webapp is pretty much all that is running in this Tomcat.

That one's easy.  Add another webapp with one page.  When the page is 
requested, call System.GC().  Job done!

 Relatedly, does there exist any way to force a given JVM
 process to do a
 full GC interactively, but from a Linux command-line ?

I'm not aware of one, but I'm not an expert.  I await the experts' comments 
with interest!

- Peter

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: CLOSE_WAIT and what to do about it

2009-04-08 Thread Caldarale, Charles R
 From: André Warnier [mailto:a...@ice-sa.com]
 Subject: Re: CLOSE_WAIT and what to do about it
 
 Relatedly, does there exist any way to force a given JVM process to do
 a full GC interactively, but from a Linux command-line ?

I haven't found one yet, but there are numerous command-line monitoring 
utilities included with the JDK that display all sorts of GC information, using 
the same connection mechanism as JConsole.  Since JConsole can force a GC in a 
JVM its monitoring, doing it from the command line is feasible.  Might have to 
do a little coding...

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: CLOSE_WAIT and what to do about it

2009-04-08 Thread Rainer Jung
Hi André,

I didn't fully read all responses, so I hope i don't repeat to much (or
worse contradict statements contained in other replies).

On 08.04.2009 12:32, André Warnier wrote:
 Like the original poster, I am seeing on my systems a fair number of
 sockets apparently stuck for a long time in the CLOSE_WAIT state.
 (Sometimes several hundreds of them).
 They seem to predominantly concern Tomcat and other java processes, but
 as Alan pointed out previously and I confirm, my perspective is slanted,
 because we use a lot of common java programs and webapps on our servers,
 and the ones mostly affected talk to eachother and come from the same
 vendor.
 Unfortunately also, I do not have the sources of these programs/webapps
 available, and will not get them, and I can't do without these programs.
 
 It has been previously established that a socket in a
 long-time-lingering CLOSE-WAIT status, is due to one or the other side
 of a TCP connection not properly closing its side of the connection when
 it is done with it.

CLOSE_WAIT says the other side shut down the connection. TCP connections
are allowed to stay for an arbitrary time in half-closed state. In
general TCP connection can be used in a duplex way. But assume one end
has finished communication (sending data). Then it can already close its
side of the connection.

The nice TCP state diagram is contained in the fundamental book of
Stevens, and can be seen e.g. at
   http://www.cse.iitb.ac.in/perfnet/cs456/tcp-state-diag.pdf

As you can see, CLOSE_WAIT on one end always implies FIN_WAIT2 on the
other end (except, when between the two ends there's yet another
component, that interferes with the communication like maybe a firewall).

In the special situation where both ends of the communication are on the
same system, one finds each connection twice, one from the point of view
of each side of the connection. It is always important to think about
which end one is looking at, when interpreting the two lines.

 I also surmise (without having a definite proof of this), that this is
 essentially bad, as it ties up some resources that could be otherwise
 freed.
 I have also been told or discovered that, our servers being Linux Debian
 servers, programs such as ps, netstat and lsof can help in
 determining precisely how many such lingering sockets there are, and who
 the culprit processes are (to some extent).

True.

 In our case, we know which are the programs involved, because we know
 which ones open a listening socket and on what fixed port, and we also
 know which are the other processes talking to them.
 But, as mentioned previously, we do not have the source of these
 programs and will not get them, but cannot practically do without them
 for now. But we do have full root control of the Linux servers where
 these programs are running.

The details may depend on the used protocols and sometimes you can get
information about timeouts you can set in the application, like idle
timeouts for persistent connections.

 So my question is : considering the situation above, is there something
 I can do locally to free these lingering CLOSE_WAIT sockets, and under
 which conditions ?
 (I must admit that I am a bit lost among the myriad options of lsof)

I would say no, if you can't change the application and the developper
of it didn't provide any configuration options. CLOSE_WAIT from the
point of view of tcp is a legitimate state without any builtin timeout.

 For example, suppose I start with a netstat -pan command and I see the
 display below (sorry for the line-wrapping).
 I see a number of sockets in the CLOSE_WAIT state, and for those I have
 a process-id, which I can associate to a particular process.
 For example, I see this line :
 tcp6  12  0 :::127.0.0.1:41764  :::127.0.0.1:11002
 CLOSE_WAIT 29649/java
 which tells me that there is a local process 29649/java, whith a local
 socket port 41674 in the CLOSE_WAIT state, related to another socket
 #11002 on the same host.
 On the other hand, I see this line :
 tcp0  0 127.0.0.1:11002 127.0.0.1:41764 FIN_WAIT2  -
 which shows a local socket on port 11002, related to this other local
 socket port #41764, with no process-id/program displayed.
 What does that tell me ?

My interpretation (not 100% sure): Not sure, what your OS shows in
netstat after closing the local side of a connection, more precisely
whether the pid is still shown, or is removed. Depending on this answer,
either we have a simple one-sided shutdown, or even a process exit. In
both cases the process 41764 didn't have any reason to use the
established connection in the meantime, so it didn't realise, that the
connection is only half-open. As soon as it tried to use it, it
should/would detect that and most likely (if programmed correctly) close it.

 I also know that the process-id 29649 corresponds to a local java
 process, of the daemon variety, multi-threaded.  That program talks to
 another known server