Hi Oleg,

[snip]

> Ken,
> 
> You might want to have a look at the lest code in SVN trunk (to be
> released as 4.3). Several classes such as the scheme registry that
> previously had to be synchronized in order to ensure thread safety have
> been replaced with immutable equivalents. There is also now a way to
> create HttpClient in a minimal configuration without authentication,
> state management (cookies), proxy support and other non-essential
> functions.

That sounds interesting - any hints as to how to create this minimal HttpClient?

> These functions are not merely disabled but physically
> removed from the processing pipeline, which should result in somewhat
> better performance in high threads contention scenarios, as the only
> synchronization point involved in request execution would be the lock of
> the connection pool. Minimal HttpClient may be particularly useful for
> anonymous web crawling when authentication and state management are not
> required.
> 
> 
>> 3. Global lock on connection pool
>> 
>> Oleg had written:
>> 
>>> Yes, your observation is correct. The problem is that the connection
>>> pool is guarded by a global lock. Naturally if you have 400 threads
>>> trying to obtain a connection at about the same time all of them end up
>>> contending for one lock. The problem is that I can't think of a
>>> different way to ensure the max limits (per route and total) are
>>> guaranteed not to be exceeded. If anyone can think of a better algorithm
>>> please do let me know. What might be a possibility is creating a more
>>> lenient and less prone to lock contention issues implementation that may
>>> under stress occasionally allocate a few more connections than the max
>>> limits.
>> 
>> I don't know if this has been resolved. My work-around from a few years ago 
>> was to rely on having multiple Hadoop reducers running on the server (each 
>> in their own JVM), where I could then limit each JVM to at most 300 
>> connections.
>> 
> 
> I experimented with the idea of lock-less (unlimited) connection manager
> but in my tests it did not perform any better than the standard
> connection manager.

Previously I'd asked:

> Would it work to go for finer-grained locking, by using atomic counters to 
> track & enforce limits on per route/total connections?

Any thoughts on that approach? E.g. have a map from route to atomic counter, 
and a single atomic counter for total connections?

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to