Hi Oleg, [snip]
> Ken, > > You might want to have a look at the lest code in SVN trunk (to be > released as 4.3). Several classes such as the scheme registry that > previously had to be synchronized in order to ensure thread safety have > been replaced with immutable equivalents. There is also now a way to > create HttpClient in a minimal configuration without authentication, > state management (cookies), proxy support and other non-essential > functions. That sounds interesting - any hints as to how to create this minimal HttpClient? > These functions are not merely disabled but physically > removed from the processing pipeline, which should result in somewhat > better performance in high threads contention scenarios, as the only > synchronization point involved in request execution would be the lock of > the connection pool. Minimal HttpClient may be particularly useful for > anonymous web crawling when authentication and state management are not > required. > > >> 3. Global lock on connection pool >> >> Oleg had written: >> >>> Yes, your observation is correct. The problem is that the connection >>> pool is guarded by a global lock. Naturally if you have 400 threads >>> trying to obtain a connection at about the same time all of them end up >>> contending for one lock. The problem is that I can't think of a >>> different way to ensure the max limits (per route and total) are >>> guaranteed not to be exceeded. If anyone can think of a better algorithm >>> please do let me know. What might be a possibility is creating a more >>> lenient and less prone to lock contention issues implementation that may >>> under stress occasionally allocate a few more connections than the max >>> limits. >> >> I don't know if this has been resolved. My work-around from a few years ago >> was to rely on having multiple Hadoop reducers running on the server (each >> in their own JVM), where I could then limit each JVM to at most 300 >> connections. >> > > I experimented with the idea of lock-less (unlimited) connection manager > but in my tests it did not perform any better than the standard > connection manager. Previously I'd asked: > Would it work to go for finer-grained locking, by using atomic counters to > track & enforce limits on per route/total connections? Any thoughts on that approach? E.g. have a map from route to atomic counter, and a single atomic counter for total connections? Thanks, -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
