Hi Oleg,
On Jan 9, 2010, at 1:35pm, Oleg Kalnichevski wrote:
Ken Krugler wrote:
I wanted to verify some behavior I'm seeing with HttpClient 4.0
I occasionally get a ConnectionPoolTimeoutException, even when I've
got spare connections in my ThreadSafeClientConnManager pool.
Looking at the ConnPoolByRoute.getEntryBlocking() code, it appears
this could happen if I'm exceeding the max connections per host
limit.
I'm only making one simultaneous request per unique host, but
looking at DefaultRequestDirector.handleResponse() I see the
HttpRoute getting recalculated when there's a redirect.
So this implies that I really have no way to protect against this
situation, as I don't know about the redirects until I'm making the
requests.
If this is true, then I'll need to add some higher level error
processing code to avoid treating this as an unexpected (bail out)
error.
I can of course bump the default max number of connections per
host, though that would only mask the problem.
Thanks,
-- Ken
Hi Ken
What is exactly the issue with bumping the per host max limit to
something like 3 connections? I am not sure why this should be a
problem.
It's not a problem - I went ahead and did that, though it does
increase the odds that a webmaster would view the simultaneous
requests as being impolite for a crawler.
But without constraints on the dataset I process, any arbitrary per
host max limit can be exceeded.
For example, stumbleupon.com uses <username>.stumbleupon.com for URLs.
When you request <username>.stumbleupon.com/robots.txt, it redirects
to stumbleupon.com/stumbler/<username>/robots.txt, thus all such
requests will be directed at the same stumbleupon.com host.
I've got almost 200 unique stumbleupon <username> values in my
dataset. Depending on Hadoop cluster size, number of threads, random
bad luck, etc. I can wind up with 20+ of these being processed at the
same time.
Alternatively you may simply want to set the connection manager
timeout to a fairly high value. This will cause the connection
manager to block the request for a connection until a connection
becomes available.
That's a good suggestion, thanks.
-- Ken
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]