Hi-
Thanks to everyone for the help in trying to figure this out.
Indeed everyone is correct and the problem, as nefarious as it is, does not
seem to be in HttpClient. Unfortunately (for me) the garbage collection of
StringBuilders (or Buffers, decided to use Builders and 1.5) that are turned
into Strings seems to be extremely slow. What I mean is that many old
allocations seem to be allowed to hang around for quite a while before being
garbage collected, despite the fact that they aren't used any more (they are
in fact nulled in my code). I have observed the heap size grow and then
fall off a cliff once the garbage collector finally decides it can clean up
those instances.
One thing that really did help (the nulled instances were not being
collected at all before this) was removing any stored references to the
crawler threads. I was keeping a reference to each running thread in a
controller class to compute statistics on how well I was doing (download
speed). When I removed the reference so that each thread was completely
dereferenced (on its own) the memory started going up much slower.
If anyone has any suggestions (maybe making the garbage collector more
aggessive?), I would love to hear them. I do want to apologize for bringing
up this problem as it turned out not to be an HttpClient problem, and thank
everyone for their help.
Thanks
James
----- Original Message -----
From: "Steve Terrell" <[EMAIL PROTECTED]>
To: "HttpClient User Discussion" <[email protected]>
Sent: Wednesday, March 15, 2006 7:39 AM
Subject: RE: Memory leak using httpclient
James,
Keep in mind that Java memory profilers tend to report what resource
is not being freed, not what is leaking. Your code is holding a
reference to something that it should not.
I have done some extensive load/performance testing with my
HttpClient based application. After 250 million calls to a Tomcat
servlet, there were no observed memory leaks. That was with HttpClient
3.0rc3, Java 1.5.06.
Our performance testing also showed that performance slowed down when
our application went past 100 threads. This may be due to a limitation
with the Tomcat instance we were calling. But with 300 threads, I wonder
if you application is spending more time context switching between
threads than real work. This was on a 3.0GHz dual processor machine
running Linux.
--Steve
-----Original Message-----
From: James Ostheimer [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 14, 2006 1:53 AM
To: [email protected]
Subject: Memory leak using httpclient
Hi-
I am using httpclient in a multi-threaded webcrawler application. I am
using the MulitThreadedHttpConnectionManager in conjunction with 300
threads that download pages from various sites.
Problem is that I am running out of memory shortly after the process
begins. I used JProfiler to analyze the memory stacks and it points to:
a.. 76.2% - 233,587 kB - 6,626 alloc.
org.apache.commons.httpclient.HttpMethod.getResponseBodyAsString
as the culprit (at most there should be a little over 300 allocations as
there are 300 threads operating at once). Other relevant information, I
am on a Windows XP Pro platform using the SUN JRE that came with
jdk1.5.0_06. I am using commons-httpclient-3.0.jar.
Here is the code where I initialize the HttpClient:
private HttpClient httpClient;
public CrawlerControllerThread(QueueThread qt, MessageReceiver
receiver, int maxThreads, String flag,
boolean filter, String filterString, String dbType) {
this.qt = qt;
this.receiver = receiver;
this.maxThreads = maxThreads;
this.flag = flag;
this.filter = filter;
this.filterString = filterString;
this.dbType = dbType;
threads = new ArrayList();
lastStatus = new HashMap();
HttpConnectionManagerParams htcmp = new HttpConnectionManagerParams();
htcmp.setMaxTotalConnections(maxThreads);
htcmp.setDefaultMaxConnectionsPerHost(10);
htcmp.setSoTimeout(5000);
MultiThreadedHttpConnectionManager mtcm = new
MultiThreadedHttpConnectionManager();
mtcm.setParams(htcmp);
httpClient = new HttpClient(mtcm);
}
The client reference to httpClient is then passed to all the crawling
threads where it is used as follows:
private String getPageApache(URL pageURL, ArrayList unProcessed) {
SaveURL saveURL = new SaveURL();
HttpMethod method = null;
HttpURLConnection urlConnection = null;
String rawPage = "";
try {
method = new GetMethod(pageURL.toExternalForm());
method.setFollowRedirects(true);
method.setRequestHeader("Content-type", "text/html");
int statusCode = httpClient.executeMethod(method);
// urlConnection = new HttpURLConnection(method,
// pageURL);
logger.debug("Requesting: "+pageURL.toExternalForm());
rawPage = method.getResponseBodyAsString();
//rawPage = saveURL.getURL(urlConnection);
if(rawPage == null){
unProcessed.add(pageURL);
}
return rawPage;
} catch (IllegalArgumentException e) {
//e.printStackTrace();
}
catch (HttpException e) {
//e.printStackTrace();
} catch (IOException e) {
unProcessed.add(pageURL);
//e.printStackTrace();
}finally {
if(method != null) {
method.releaseConnection();
}
try {
if(urlConnection != null) {
if(urlConnection.getInputStream() != null) {
urlConnection.getInputStream().close();
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
urlConnection = null;
method = null;
}
return null;
}
As you can see, I release the connection in the finally statement, so
that should not be a problem. Upon running the getPageApache above the
returned page as a string is processed and then set to null for garbage
collection. I have been playing with this, closing streams, using
HttpUrlConnection instead of the GetMethod, and I cannot find the
answer. Indeed it seems the answer does not lie in my code.
I greatly appreciate any help that anyone can give me, I am at the end
of my ropes with this one.
James
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]