On Mon, 2012-01-23 at 11:36 -0800, Dvora wrote:
> Hi,
> 
> I would like to code an high performance web crawler using httpclient 4.1.2.
> In order to bring the machine to highest throughput, each crawling thread
> creating a DefaultHttpClient with a pool configured as follow (based on one
> of the examples):
> 
> static
>       {
>               cm = new ThreadSafeClientConnManager();
>               cm.setMaxTotal( 50000 );
>               cm.setDefaultMaxPerRoute( Integer.MAX_VALUE );
> 
>               HttpClient client = new DefaultHttpClient();
> 
>               params = client.getParams();
> 
>               HttpClientParams.setRedirecting( params, false );
>               HttpClientParams.setAuthenticating( params, true );
> 
>               HttpConnectionParams.setSoTimeout( params, 30000 );
>               HttpConnectionParams.setConnectionTimeout( params, 30000 );
> 
>               IdleConnectionEvictor connEvictor = new IdleConnectionEvictor( 
> cm );
> 
>               connEvictor.start();
>       }
> 
> When running the application with lots of crawling threads, netstat show
> only 2k tcp connections in status ESTABLISHED. Is this expected considering
> maxTotsl = 50000? Are there other bottlenecks (OS level, etc.) blocking the
> application to reach more than 2k tcp connections?
> 
> Thanks.
> 
> 

I personally think this is to be expected. When running performance
stress tests with 200 threads and 200 max connections limit I frequently
observe HttpClient utilizing significantly fewer connections (~100)
never ever reaching the max limit.  

Oleg   



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to