I've been looking at the code more closely to verify, and I agree with
Oleg. There should be no issue with calling abort and releasing the
connection. My guess is that connections are being leaked somewhere
else, or something else is causing the stalling.
Mike
On 7/22/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> Hi Oleg,
> I'm not sure exactly whats going on as I haven't dug through the
> source code enough but I do know that when I try using
> MultiThreadedHttpConnectionManager and calling releaseconnection in
> the finally block as you have done here, my bot threads start hanging
> after a few hundred requests. I am only hypothesizing that the
> connections are not returning to the pool.
>
> On 7/22/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote:
> > On Fri, Jul 22, 2005 at 01:07:11PM -0400, Tony Spencer wrote:
> > > In case anyone else is using HttpClient for a multi-threaded crawler,
> > > here is the solution that seems to solve all the problems in this
> > > discussion:
> > >
> > > Don't use the MultiThreadedHttpConnectionManager. You will need to
> > > bail if a response body reaches a limit you define (mine is 100k).
> > > The only way to break the connection is to call HttpMethod.abort.
> > > Unfortunately this doesn't allow the HttpConnection to be safely
> > > returned to the connection manager's pool.
> >
> > Tony,
> >
> > Why is that? What is it that prevents the connection from being returned
> > back to the pool? I believe HttpMethod#releaseConnection should have no
> > problem handling connections that have been closed by HttpMethod#abort
> >
> > GetMethod httpget = new GetMethod("/stuff");
> > try {
> > httpclient.executeMethod(httpget);
> > // do something with the response
> > // and if you get fed up, just call
> > httpget.abort();
> > } finally {
> > httpget.releaseConnection();
> > }
> >
> > Oleg
> >
> >
> > Instead, I found pretty
> > > good performance by creating a new HttpClient (simple constructor :
> > > new HttpClient()) for each thread and use it for 1,000 requests at
> > > which time I destroy the current and create a new one. I'm sure this
> > > doesn't perform as well as the multi threaded manager but it ran all
> > > night for me with no exceptions, no memory leaks, and pulled down 2
> > > million sites in about 12 hours (running 100 threads). Not bad.
> > >
> > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> > > > Ok, I hope you aren't getting sick of this problem. :)
> > > >
> > > > HttpMethod.abort does solve the problem of sites that send an infinite
> > > > response. However, it seems that by calling abort we cannot properly
> > > > release the connection. I've tried calling method.releaseConnection
> > > > right after abort.
> > > >
> > > > My usage for HttpClient is a multi-threaded crawler so I've followed
> > > > the suggestions on the threading page
> > > > http://jakarta.apache.org/commons/httpclient/threading.html (nice
> > > > documentation by the way). So I use the
> > > > MultiThreadedHttpConnectionManager as suggested and reuse the same
> > > > HttpClient over and over as suggested. After a certain number of
> > > > calls to HttpMethod.abort my HttpClient goes bad (hangs).
> > > >
> > > > So it appears that abort is too harsh and doesn't allow clean return
> > > > of the client to the pool. Any more suggestions?
> > > >
> > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> > > > > Disregard my last message. Your suggestion did work Oleg. Originally
> > > > > I put the abort statement after attempted to close the input stream.
> > > > > Once I moved it in front of the stream close statement it worked fine.
> > > > > Thank you very much.
> > > > >
> > > > > On 7/21/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote:
> > > > > > Just call HttpMethod#abort to close the underlying connection
> > > > > >
> > > > > > Oleg
> > > > > >
> > > > > >
> > > > > > On Thu, 2005-07-21 at 16:34 -0400, Tony Spencer wrote:
> > > > > > > Ok, I managed to limit the the response to 8k in the following
> > > > > > > code
> > > > > > > but it doesn't help with what I'm really trying to accomplish.
> > > > > > > Sometimes there is a site that will spew a neverending response.
> > > > > > > This
> > > > > > > causes HttpClient to hang indefinitely. My code below does not
> > > > > > > solve
> > > > > > > the problem. Here is an example of a nasty site that never stops
> > > > > > > sending response: http://www.tfc-charts.w2d.com/chart/dw/w
> > > > > > > (beware.
> > > > > > > it may crash your browser if you browse it)
> > > > > > >
> > > > > > > InputStream is = method.getResponseBodyAsStream();
> > > > > > > BufferedInputStream bis = new
> > > > > > > BufferedInputStream(is);
> > > > > > > byte[] bytes = new byte[ 8192 ];
> > > > > > > bis.read(bytes);
> > > > > > > bis.close();
> > > > > > > is.close();
> > > > > > > ret = new String(bytes);
> > > > > > >
> > > > > > >
> > > > > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> > > > > > > > I'd like to limit the size of the response but don't know how.
> > > > > > > > For
> > > > > > > > instance, if the response body is greater than 100k I would
> > > > > > > > like to
> > > > > > > > close the connection to the site. How can I go about doing
> > > > > > > > this? I
> > > > > > > > see the available method param : BUFFER_WARN_TRIGGER_LIMIT but
> > > > > > > > it only
> > > > > > > > seems to control warning logging.
> > > > > > > >
> > > > > > > > Currently I receive the response body like so:
> > > > > > > > byte[] bytes = method.getResponseBody();
> > > > > > > >
> > > > > > > > Any help greatly appreciated.
> > > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]