Tony While you were away we have fixed a rather nasty bug, which may also have been the cause of the problems you were having.
http://issues.apache.org/bugzilla/show_bug.cgi?id=35944 Could you please get the latest SVN snapshot and test your application against it? I'll look at the logs you have posted if you confirm that the problem still persists. It is a massive amount of data to go through, so you would really appreciate it if I did not have to look at unnecessarily. Have you seen anything of this sort in the logs or in the standard out/standard error? java.lang.IllegalStateException: Connection is not open at org.apache.commons.httpclient.HttpConnection.assertOpen(HttpConnection.java:1269) at org.apache.commons.httpclient.HttpConnection.isResponseAvailable(HttpConnection.java:872) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager $HttpConnectionAdapter.isResponseAvailable(MultiThreadedHttpConnectionManager.java:1307) at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java: 2272) at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java: 1755) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher (AutoCloseInputStream.java:180) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:140) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1086) Cheers, Oleg On Mon, 2005-08-08 at 16:09 -0400, Tony Spencer wrote: > Hi Oleg, > Sorry for the late reply but I was away on vacation. I finally > configured my logging and attempted to use the connection manager and > yes I did see multiple occurrences of exactly what you mentioned: > > Unable to get a connection, waiting..., hostConfig=HostConfiguration > > I'm sending you the wire and context log privately. Thank you very > much for taking a look. > > Tony > > > On 7/22/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote: > > On Fri, 2005-07-22 at 15:13 -0400, Tony Spencer wrote: > > > Hi Oleg, > > > I'm not sure exactly whats going on as I haven't dug through the > > > source code enough but I do know that when I try using > > > MultiThreadedHttpConnectionManager and calling releaseconnection in > > > the finally block as you have done here, my bot threads start hanging > > > after a few hundred requests. I am only hypothesizing that the > > > connections are not returning to the pool. > > > > > > > Tony, > > > > Do you see something like that in the log? > > > > Unable to get a connection, waiting..., hostConfig=... > > > > Anyways, if you manage to produce a context/wire log of the session, we > > amy be able to figure out that goes wrong > > > > Oleg > > > > > On 7/22/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote: > > > > On Fri, Jul 22, 2005 at 01:07:11PM -0400, Tony Spencer wrote: > > > > > In case anyone else is using HttpClient for a multi-threaded crawler, > > > > > here is the solution that seems to solve all the problems in this > > > > > discussion: > > > > > > > > > > Don't use the MultiThreadedHttpConnectionManager. You will need to > > > > > bail if a response body reaches a limit you define (mine is 100k). > > > > > The only way to break the connection is to call HttpMethod.abort. > > > > > Unfortunately this doesn't allow the HttpConnection to be safely > > > > > returned to the connection manager's pool. > > > > > > > > Tony, > > > > > > > > Why is that? What is it that prevents the connection from being returned > > > > back to the pool? I believe HttpMethod#releaseConnection should have no > > > > problem handling connections that have been closed by HttpMethod#abort > > > > > > > > GetMethod httpget = new GetMethod("/stuff"); > > > > try { > > > > httpclient.executeMethod(httpget); > > > > // do something with the response > > > > // and if you get fed up, just call > > > > httpget.abort(); > > > > } finally { > > > > httpget.releaseConnection(); > > > > } > > > > > > > > Oleg > > > > > > > > > > > > Instead, I found pretty > > > > > good performance by creating a new HttpClient (simple constructor : > > > > > new HttpClient()) for each thread and use it for 1,000 requests at > > > > > which time I destroy the current and create a new one. I'm sure this > > > > > doesn't perform as well as the multi threaded manager but it ran all > > > > > night for me with no exceptions, no memory leaks, and pulled down 2 > > > > > million sites in about 12 hours (running 100 threads). Not bad. > > > > > > > > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote: > > > > > > Ok, I hope you aren't getting sick of this problem. :) > > > > > > > > > > > > HttpMethod.abort does solve the problem of sites that send an > > > > > > infinite > > > > > > response. However, it seems that by calling abort we cannot > > > > > > properly > > > > > > release the connection. I've tried calling method.releaseConnection > > > > > > right after abort. > > > > > > > > > > > > My usage for HttpClient is a multi-threaded crawler so I've followed > > > > > > the suggestions on the threading page > > > > > > http://jakarta.apache.org/commons/httpclient/threading.html (nice > > > > > > documentation by the way). So I use the > > > > > > MultiThreadedHttpConnectionManager as suggested and reuse the same > > > > > > HttpClient over and over as suggested. After a certain number of > > > > > > calls to HttpMethod.abort my HttpClient goes bad (hangs). > > > > > > > > > > > > So it appears that abort is too harsh and doesn't allow clean > > > > > > return > > > > > > of the client to the pool. Any more suggestions? > > > > > > > > > > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote: > > > > > > > Disregard my last message. Your suggestion did work Oleg. > > > > > > > Originally > > > > > > > I put the abort statement after attempted to close the input > > > > > > > stream. > > > > > > > Once I moved it in front of the stream close statement it worked > > > > > > > fine. > > > > > > > Thank you very much. > > > > > > > > > > > > > > On 7/21/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote: > > > > > > > > Just call HttpMethod#abort to close the underlying connection > > > > > > > > > > > > > > > > Oleg > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 2005-07-21 at 16:34 -0400, Tony Spencer wrote: > > > > > > > > > Ok, I managed to limit the the response to 8k in the > > > > > > > > > following code > > > > > > > > > but it doesn't help with what I'm really trying to accomplish. > > > > > > > > > Sometimes there is a site that will spew a neverending > > > > > > > > > response. This > > > > > > > > > causes HttpClient to hang indefinitely. My code below does > > > > > > > > > not solve > > > > > > > > > the problem. Here is an example of a nasty site that never > > > > > > > > > stops > > > > > > > > > sending response: http://www.tfc-charts.w2d.com/chart/dw/w > > > > > > > > > (beware. > > > > > > > > > it may crash your browser if you browse it) > > > > > > > > > > > > > > > > > > InputStream is = > > > > > > > > > method.getResponseBodyAsStream(); > > > > > > > > > BufferedInputStream bis = new > > > > > > > > > BufferedInputStream(is); > > > > > > > > > byte[] bytes = new byte[ 8192 ]; > > > > > > > > > bis.read(bytes); > > > > > > > > > bis.close(); > > > > > > > > > is.close(); > > > > > > > > > ret = new String(bytes); > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote: > > > > > > > > > > I'd like to limit the size of the response but don't know > > > > > > > > > > how. For > > > > > > > > > > instance, if the response body is greater than 100k I would > > > > > > > > > > like to > > > > > > > > > > close the connection to the site. How can I go about doing > > > > > > > > > > this? I > > > > > > > > > > see the available method param : BUFFER_WARN_TRIGGER_LIMIT > > > > > > > > > > but it only > > > > > > > > > > seems to control warning logging. > > > > > > > > > > > > > > > > > > > > Currently I receive the response body like so: > > > > > > > > > > byte[] bytes = method.getResponseBody(); > > > > > > > > > > > > > > > > > > > > Any help greatly appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
