hi all,
I used to use HttpComponents Client to crawl webpages. I need to
improve it by using async client. What I want to is something like:
Queue<URL> needCrawlQueue;
Queue<String[]> htmlQueue;
HttpAsyncClient client;
int maxConcurrent=500;
//if finished a url, then get notified and call back this code
if(client.currentCrawlingCount<maxConcurrent){
URL url=needCrawlQueue.take();
//request this url
}
//if finished a url, then get notifed and call back this code
//String url;String html is call back arguments
htmlQueue.put(new String[]{url, html};
I mean I have a asnyc client class which take two queues.
if current unfinished urls less than maxConcurrent, then it task a
url from a queue and request this url. if a url succeed(or failed),
add the result to another queue.
------------------------------------
I use 500 threads in a 4 cpu virtual machine. The load average is
about 7 and context switch(using vmstat) is larger than 4,000
so I want to give async client a try. anyone can help me? I don't know
how to use async client. it only return a future. I am not familiar
with it.
what I want is a class. it take urls from a queue and fetch its content
and then send to another queue.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]