> On 4 Mar 2015, at 10:54, Paul Bear <[email protected]> wrote:
> 
> I have a large list of URLs (about 1 millon) and I want to run
> 
> - one thread that runs through the list and asynchronously sends GET
> requests
> - several worker threads that process the responses
> 
> Is it possible to separate sending GETs and processing responses in
> different threads using Apache Client ?
> 
> Any ideas are welcome!

Well, just use BUbiNG, our crawler. :) It will take care of politeness, etc., 
and download in parallel from any number of hosts. You just have to implement 
the HTMLParser interface to do the processing you need.

Ciao,

                                        seba


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to