Finally! The problem is actually a confirmed bug in Java: http://bugs.sun.com/view_bug.do?bug_id=6578538
Through JProfiler I noticed that two SunJCE instances were being created per request, and they were in turn holding on to more memory that was not being released. The above bug mentions identical behavior. It was reported in July 2007, so I'm not sure how long it will take Sun to fix the issue. It appears that it was reported against both Java 1.5 and 1.6. In our environment we used Java 1.6.0_03, and the issue was occurring. Our workaround will be to configure CAS to use LDAP to talk to Active Directory instead of having CAS go through JAAS and Kerberos. Thanks to all who helped out in this! Brad Cupit Louisiana State University - UIS From: Brad A Cupit Sent: Friday, February 01, 2008 1:46 PM To: 'Yale CAS mailing list' Subject: RE: trying to track down jaas memory leak here's an update: I found out that it is "kill -3" to do a java thread dump, but for some reason on our machine it did not output anything to stdout/stderr or any log files. I haven't been able to reproduce the blocked threads problem, so either I didn't wait long enough when I initially saw the problem (meaning the threads would eventually have become unblocked) or I just didn't hit the server hard enough in my recent load tests. I have, however, significantly narrowed down the source of the memory leak. I removed Apache (and just let Tomcat listen directly for http tests). This did not affect the leak. I removed our own custom code, which also did not affect the leak. I used SimpleTestUsernamePasswordAuthenticationHandler (rather than JaasAuthenticationHandler) which made memory usage significantly better for a few thousand requests. I then re-enabled the JaasAuthenticationHandler (and removed SimpleTestUsernamePasswordAuthenticationHandler), but instead of using the Krb5LoginModule, I wrote a DoNothingLoginModule (each method is either blank or just returns true), and the memory usage was great. So, our problems are related to the Kerberos login module. I'm going to review our settings in /etc/krb5.conf, upgrade to jdk 1.6.0_04 and, if those yield no results, I may just fall back to using LDAP to talk to Active Directory. Brad Cupit Louisiana State University - UIS e-mail: [EMAIL PROTECTED] office: 225.578.4774 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Battaglia Sent: Thursday, January 31, 2008 10:51 AM To: Yale CAS mailing list Subject: Re: trying to track down jaas memory leak Brad, Did you do a kill -6 (or whatever it is to do the thread dump) and see where the threads are blocking? -Scott On Jan 31, 2008 11:41 AM, Brad A Cupit <[EMAIL PROTECTED]> wrote: I added udp_preference_limit = 1 in the [libdefaults] section of the /etc/krb5.conf, but it didn't seem to address the issue. We are running on Linux (RHEL) with Java 1.6.0_03. We have seen an unusually large number of blocked threads after a few hundred requests, and after enough connections Tomcat stops responding. There could be several things wrong with our environment such as a broken connection to Active Directory or a broken connection to Domino (custom code we wrote to generate an LtpaToken for single sign on to Lotus Notes apps). We have not seen an OutOfMemoryError since changing Xmx from 64m (the default) to 256m, however, the memory is still growing and eventually Tomcat becomes unresponsive, presumably due to the number of blocked threads. I'll continue to narrow down the areas which could be a problem, and repost to this list as I find more information. Thanks for the help so far! Brad Cupit Louisiana State University - UIS -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Spencer Sent: Thursday, January 31, 2008 3:47 AM To: Yale CAS mailing list Subject: Re: trying to track down jaas memory leak Sorry - it was late at night and I got my TCP and UDP back-to-front. com.sun.security.auth.module.Krb5LoginModule will ordinarily use UDP sockets and it is these that we were seeing accumulating. A "udp_preference_limit" can be set in the kerberos configuration (krb5.conf) and if the size of the message is greater than this limit TCP is used instead. By setting the udp_preference_limit to 1, we forced all messages to be sent by TCP and our UDP socket leak went away. Sorry if I've confused anyone! Dave --On 30 January 2008 23:06 +0000 David Spencer <[EMAIL PROTECTED]> wrote: > Brad, > > Possibly an unrelated problem and I don't have all the details to hand but > will look them up tomorrow at work if it seems relevant to you. > > We ran into a problem with com.sun.security.auth.module.Krb5LoginModule that > caused our CAS server to gradually accumulate TCP sockets and eventually fall > over when it had used up all the socket resources on the box. This was Java 5 > on some flavour of Linux. We hadn't seen the problem running the same code on > Solaris. I think we would have been running with a larger heap than 256Mb so > we perhaps hit a socket resource problem before we hit the memory limit you > are seeing? > > A bit of digging showed that it was forgetting to close the TCP socket but it > also showed that the section that dealt with UDP sockets didn't have the same > problem. We asked the module to always use UDP sockets and the leak went > away. CAS service was running uninterrupted throughout 2007. > > I'll dig out the details in the morning. > Dave > > > --On 30 January 2008 16:22 -0600 Brad A Cupit <[EMAIL PROTECTED]> wrote: > >> >> >> Hello, >> >> We have a CAS server using JAAS + Kerberos to authenticate users against >> Active Directory. We started seeing OutOfMemoryErrors with the default Xmx >> (of 64m) which we have since bumped up to 256m. We haven't had >> OutOfMemoryErrors since then, but the memory usage keeps rising. >> >> >> >> I've hooked up JProfiler to try and see where the memory is going, and >> noticed that it goes up with each request, and running the garbage collector >> (via System.gc()) doesn't reclaim many of the objects. I'm sure we just have >> a configuration error of sorts, but I've spent a few days and can't seem to >> figure it out. >> >> >> >> JProfiler tells me that after a few requests (500 or so), we have an enormous >> number of LinkedHashMap$Entry objects, as well as >> java.security.Provider$ServiceKey, java.security.Provider$Service, and >> HashMap$Entry instances. >> >> >> >> I've also noticed that instances of com.sun.crypto.provider.SunJCE go up by 2 >> per request, and don't get reclaimed with garbage collection. >> >> >> >> JProfiler's cumulative allocations point to >> javax.security.auth.login.LoginContext.login() method, but I've checked out >> the code and stepped through it with a debugger, but can't see anything wrong >> (no creation of instances that would be uncollectable by the gc). >> >> >> >> If it helps, here's our jaas.conf file: >> >> >> >> CAS { >> >> com.sun.security.auth.module.Krb5LoginModule required client=TRUE >> debug=FALSE useTicketCache=FALSE; >> >> }; >> >> >> >> I'm going to try to setup CAS to use the LDAP authentication handler to see >> if the problem is strictly JAAS related. >> >> >> >> Has anyone seen issues like this before? >> >> >> >> Thanks in advance! >> >> >> >> Brad Cupit >> Louisiana State University - UIS >> e-mail: [EMAIL PROTECTED] >> office: 225.578.4774 >> >> > > > > ---------------------- > David Spencer > Information Systems and Computing > University of Bristol > _______________________________________________ > Yale CAS mailing list > [email protected] > http://tp.its.yale.edu/mailman/listinfo/cas ---------------------- David Spencer Information Systems and Computing University of Bristol _______________________________________________ Yale CAS mailing list [email protected] http://tp.its.yale.edu/mailman/listinfo/cas _______________________________________________ Yale CAS mailing list [email protected] http://tp.its.yale.edu/mailman/listinfo/cas -- -Scott Battaglia LinkedIn: http://www.linkedin.com/in/scottbattaglia
_______________________________________________ Yale CAS mailing list [email protected] http://tp.its.yale.edu/mailman/listinfo/cas
