On Tue, 2012-12-25 at 08:53 -0800, vigna wrote:
> Well, if you do a world-wide crawl with order of pages in the billions it
> happens. This particular problem arose with a proxy simulating 100,000,000
> sites during a crawl.
> 
> I agree that it is an event that can happen only with very specific
> applications, like high-performance crawlers, but it is not impossible.
> 
> 

Assuming that the crawler traverses various hosts more or less
sequentially, a very simple fix to the problem would be to remove per
route pools once they become empty in order to prevent the map from
growing beyond the number of total max number of concurrent connections.

Just out of curiosity, why are using an asynchronous HTTP client for a
web crawler? I personally would consider a blocking HTTP client a much
better choice for a heavy duty web crawler.

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to