Hmm, any idea why?

Anyway, if I may use this thread, can you suggest an optimal architecture
for crawling using httpclient? What is the best way (beside using lots of
worker threads, which I do now) to download maximum web pages in minimum
time, and better utilizing the bandwidth (now it's never crossing the
2Mb/sec) ?

Thanks.



olegk wrote:
> 
> On Mon, 2012-01-23 at 11:36 -0800, Dvora wrote:
>> Hi,
>> 
>> I would like to code an high performance web crawler using httpclient
>> 4.1.2.
>> In order to bring the machine to highest throughput, each crawling thread
>> creating a DefaultHttpClient with a pool configured as follow (based on
>> one
>> of the examples):
>> 
>> static
>>      {
>>              cm = new ThreadSafeClientConnManager();
>>              cm.setMaxTotal( 50000 );
>>              cm.setDefaultMaxPerRoute( Integer.MAX_VALUE );
>> 
>>              HttpClient client = new DefaultHttpClient();
>> 
>>              params = client.getParams();
>> 
>>              HttpClientParams.setRedirecting( params, false );
>>              HttpClientParams.setAuthenticating( params, true );
>> 
>>              HttpConnectionParams.setSoTimeout( params, 30000 );
>>              HttpConnectionParams.setConnectionTimeout( params, 30000 );
>> 
>>              IdleConnectionEvictor connEvictor = new IdleConnectionEvictor( 
>> cm );
>> 
>>              connEvictor.start();
>>      }
>> 
>> When running the application with lots of crawling threads, netstat show
>> only 2k tcp connections in status ESTABLISHED. Is this expected
>> considering
>> maxTotsl = 50000? Are there other bottlenecks (OS level, etc.) blocking
>> the
>> application to reach more than 2k tcp connections?
>> 
>> Thanks.
>> 
>> 
> 
> I personally think this is to be expected. When running performance
> stress tests with 200 threads and 200 max connections limit I frequently
> observe HttpClient utilizing significantly fewer connections (~100)
> never ever reaching the max limit.  
> 
> Oleg   
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Understanding-how-ThreadSafeClientConnManager-parameters-affect-number-of-tcp-connections-tp33190497p33197498.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to