Hmm, any idea why?
Anyway, if I may use this thread, can you suggest an optimal architecture
for crawling using httpclient? What is the best way (beside using lots of
worker threads, which I do now) to download maximum web pages in minimum
time, and better utilizing the bandwidth (now it's never crossing the
2Mb/sec) ?
Thanks.
olegk wrote:
>
> On Mon, 2012-01-23 at 11:36 -0800, Dvora wrote:
>> Hi,
>>
>> I would like to code an high performance web crawler using httpclient
>> 4.1.2.
>> In order to bring the machine to highest throughput, each crawling thread
>> creating a DefaultHttpClient with a pool configured as follow (based on
>> one
>> of the examples):
>>
>> static
>> {
>> cm = new ThreadSafeClientConnManager();
>> cm.setMaxTotal( 50000 );
>> cm.setDefaultMaxPerRoute( Integer.MAX_VALUE );
>>
>> HttpClient client = new DefaultHttpClient();
>>
>> params = client.getParams();
>>
>> HttpClientParams.setRedirecting( params, false );
>> HttpClientParams.setAuthenticating( params, true );
>>
>> HttpConnectionParams.setSoTimeout( params, 30000 );
>> HttpConnectionParams.setConnectionTimeout( params, 30000 );
>>
>> IdleConnectionEvictor connEvictor = new IdleConnectionEvictor(
>> cm );
>>
>> connEvictor.start();
>> }
>>
>> When running the application with lots of crawling threads, netstat show
>> only 2k tcp connections in status ESTABLISHED. Is this expected
>> considering
>> maxTotsl = 50000? Are there other bottlenecks (OS level, etc.) blocking
>> the
>> application to reach more than 2k tcp connections?
>>
>> Thanks.
>>
>>
>
> I personally think this is to be expected. When running performance
> stress tests with 200 threads and 200 max connections limit I frequently
> observe HttpClient utilizing significantly fewer connections (~100)
> never ever reaching the max limit.
>
> Oleg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
>
--
View this message in context:
http://old.nabble.com/Understanding-how-ThreadSafeClientConnManager-parameters-affect-number-of-tcp-connections-tp33190497p33197498.html
Sent from the HttpClient-User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]