Hi Oleg,

On Jan 9, 2010, at 1:35pm, Oleg Kalnichevski wrote:

Ken Krugler wrote:
I wanted to verify some behavior I'm seeing with HttpClient 4.0
I occasionally get a ConnectionPoolTimeoutException, even when I've got spare connections in my ThreadSafeClientConnManager pool. Looking at the ConnPoolByRoute.getEntryBlocking() code, it appears this could happen if I'm exceeding the max connections per host limit. I'm only making one simultaneous request per unique host, but looking at DefaultRequestDirector.handleResponse() I see the HttpRoute getting recalculated when there's a redirect. So this implies that I really have no way to protect against this situation, as I don't know about the redirects until I'm making the requests. If this is true, then I'll need to add some higher level error processing code to avoid treating this as an unexpected (bail out) error. I can of course bump the default max number of connections per host, though that would only mask the problem.
Thanks,
-- Ken

Hi Ken

What is exactly the issue with bumping the per host max limit to something like 3 connections? I am not sure why this should be a problem.

It's not a problem - I went ahead and did that, though it does increase the odds that a webmaster would view the simultaneous requests as being impolite for a crawler.

But without constraints on the dataset I process, any arbitrary per host max limit can be exceeded.

For example, stumbleupon.com uses <username>.stumbleupon.com for URLs. When you request <username>.stumbleupon.com/robots.txt, it redirects to stumbleupon.com/stumbler/<username>/robots.txt, thus all such requests will be directed at the same stumbleupon.com host.

I've got almost 200 unique stumbleupon <username> values in my dataset. Depending on Hadoop cluster size, number of threads, random bad luck, etc. I can wind up with 20+ of these being processed at the same time.

Alternatively you may simply want to set the connection manager timeout to a fairly high value. This will cause the connection manager to block the request for a connection until a connection becomes available.

That's a good suggestion, thanks.

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to