On Mon, 2012-01-23 at 11:36 -0800, Dvora wrote:
> Hi,
>
> I would like to code an high performance web crawler using httpclient 4.1.2.
> In order to bring the machine to highest throughput, each crawling thread
> creating a DefaultHttpClient with a pool configured as follow (based on one
> of the examples):
>
> static
> {
> cm = new ThreadSafeClientConnManager();
> cm.setMaxTotal( 50000 );
> cm.setDefaultMaxPerRoute( Integer.MAX_VALUE );
>
> HttpClient client = new DefaultHttpClient();
>
> params = client.getParams();
>
> HttpClientParams.setRedirecting( params, false );
> HttpClientParams.setAuthenticating( params, true );
>
> HttpConnectionParams.setSoTimeout( params, 30000 );
> HttpConnectionParams.setConnectionTimeout( params, 30000 );
>
> IdleConnectionEvictor connEvictor = new IdleConnectionEvictor(
> cm );
>
> connEvictor.start();
> }
>
> When running the application with lots of crawling threads, netstat show
> only 2k tcp connections in status ESTABLISHED. Is this expected considering
> maxTotsl = 50000? Are there other bottlenecks (OS level, etc.) blocking the
> application to reach more than 2k tcp connections?
>
> Thanks.
>
>
I personally think this is to be expected. When running performance
stress tests with 200 threads and 200 max connections limit I frequently
observe HttpClient utilizing significantly fewer connections (~100)
never ever reaching the max limit.
Oleg
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]