On Fri, 2005-07-22 at 15:13 -0400, Tony Spencer wrote:
> Hi Oleg,
> I'm not sure exactly whats going on as I haven't dug through the
> source code enough but I do know that when I try using
> MultiThreadedHttpConnectionManager and calling releaseconnection in
> the finally block as you have done here, my bot threads start hanging
> after a few hundred requests. I am only hypothesizing that the
> connections are not returning to the pool.
>
Tony,
Do you see something like that in the log?
Unable to get a connection, waiting..., hostConfig=...
Anyways, if you manage to produce a context/wire log of the session, we
amy be able to figure out that goes wrong
Oleg
> On 7/22/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote:
> > On Fri, Jul 22, 2005 at 01:07:11PM -0400, Tony Spencer wrote:
> > > In case anyone else is using HttpClient for a multi-threaded crawler,
> > > here is the solution that seems to solve all the problems in this
> > > discussion:
> > >
> > > Don't use the MultiThreadedHttpConnectionManager. You will need to
> > > bail if a response body reaches a limit you define (mine is 100k).
> > > The only way to break the connection is to call HttpMethod.abort.
> > > Unfortunately this doesn't allow the HttpConnection to be safely
> > > returned to the connection manager's pool.
> >
> > Tony,
> >
> > Why is that? What is it that prevents the connection from being returned
> > back to the pool? I believe HttpMethod#releaseConnection should have no
> > problem handling connections that have been closed by HttpMethod#abort
> >
> > GetMethod httpget = new GetMethod("/stuff");
> > try {
> > httpclient.executeMethod(httpget);
> > // do something with the response
> > // and if you get fed up, just call
> > httpget.abort();
> > } finally {
> > httpget.releaseConnection();
> > }
> >
> > Oleg
> >
> >
> > Instead, I found pretty
> > > good performance by creating a new HttpClient (simple constructor :
> > > new HttpClient()) for each thread and use it for 1,000 requests at
> > > which time I destroy the current and create a new one. I'm sure this
> > > doesn't perform as well as the multi threaded manager but it ran all
> > > night for me with no exceptions, no memory leaks, and pulled down 2
> > > million sites in about 12 hours (running 100 threads). Not bad.
> > >
> > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> > > > Ok, I hope you aren't getting sick of this problem. :)
> > > >
> > > > HttpMethod.abort does solve the problem of sites that send an infinite
> > > > response. However, it seems that by calling abort we cannot properly
> > > > release the connection. I've tried calling method.releaseConnection
> > > > right after abort.
> > > >
> > > > My usage for HttpClient is a multi-threaded crawler so I've followed
> > > > the suggestions on the threading page
> > > > http://jakarta.apache.org/commons/httpclient/threading.html (nice
> > > > documentation by the way). So I use the
> > > > MultiThreadedHttpConnectionManager as suggested and reuse the same
> > > > HttpClient over and over as suggested. After a certain number of
> > > > calls to HttpMethod.abort my HttpClient goes bad (hangs).
> > > >
> > > > So it appears that abort is too harsh and doesn't allow clean return
> > > > of the client to the pool. Any more suggestions?
> > > >
> > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> > > > > Disregard my last message. Your suggestion did work Oleg. Originally
> > > > > I put the abort statement after attempted to close the input stream.
> > > > > Once I moved it in front of the stream close statement it worked fine.
> > > > > Thank you very much.
> > > > >
> > > > > On 7/21/05, Oleg Kalnichevski <[EMAIL PROTECTED]> wrote:
> > > > > > Just call HttpMethod#abort to close the underlying connection
> > > > > >
> > > > > > Oleg
> > > > > >
> > > > > >
> > > > > > On Thu, 2005-07-21 at 16:34 -0400, Tony Spencer wrote:
> > > > > > > Ok, I managed to limit the the response to 8k in the following
> > > > > > > code
> > > > > > > but it doesn't help with what I'm really trying to accomplish.
> > > > > > > Sometimes there is a site that will spew a neverending response.
> > > > > > > This
> > > > > > > causes HttpClient to hang indefinitely. My code below does not
> > > > > > > solve
> > > > > > > the problem. Here is an example of a nasty site that never stops
> > > > > > > sending response: http://www.tfc-charts.w2d.com/chart/dw/w
> > > > > > > (beware.
> > > > > > > it may crash your browser if you browse it)
> > > > > > >
> > > > > > > InputStream is = method.getResponseBodyAsStream();
> > > > > > > BufferedInputStream bis = new
> > > > > > > BufferedInputStream(is);
> > > > > > > byte[] bytes = new byte[ 8192 ];
> > > > > > > bis.read(bytes);
> > > > > > > bis.close();
> > > > > > > is.close();
> > > > > > > ret = new String(bytes);
> > > > > > >
> > > > > > >
> > > > > > > On 7/21/05, Tony Spencer <[EMAIL PROTECTED]> wrote:
> > > > > > > > I'd like to limit the size of the response but don't know how.
> > > > > > > > For
> > > > > > > > instance, if the response body is greater than 100k I would
> > > > > > > > like to
> > > > > > > > close the connection to the site. How can I go about doing
> > > > > > > > this? I
> > > > > > > > see the available method param : BUFFER_WARN_TRIGGER_LIMIT but
> > > > > > > > it only
> > > > > > > > seems to control warning logging.
> > > > > > > >
> > > > > > > > Currently I receive the response body like so:
> > > > > > > > byte[] bytes = method.getResponseBody();
> > > > > > > >
> > > > > > > > Any help greatly appreciated.
> > > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]