> But why would you want a web crawler to have 10-20K simultaneously > opened connections in the first place?
(I thought I answered this, but it's not on the archive. Boh.) Having a few thousands connection open is the only way to retrieve data respecting politeness (e.g., not banging the same site too often). I have another question: is there any suggestion for parameters of the asynchronous client in case of several thousands parallel requests (e.g., for the IOReactor)? We are experimenting both with DefaulHttpClient and DefaultHttpAsyncClient, and with the same configuration (e.g., 4000 threads using DefaultHttpClient or 64 threads pushing 4000 async requests into a default DefaultHttpAsyncClient) we see completely different behaviours. The sync client fetches more than 10000 pages/s, the async client speed fetches 50 p/s. Should we increase the number of threads or the I/O interval of the IOReactor? -- View this message in context: http://httpcomponents.10934.n7.nabble.com/AbstractNIOConnPool-memory-leak-tp18554p18641.html Sent from the HttpClient-User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
