You can create a local context and use that for all requests to the same server. This then lets you re-use the same HttpClient, which is how you want to handle this (versus creating new instances for each domain).

For example, in Bixo's SimpleHttpFetcher there's this code:

            getter = new HttpGet(new URI(url));

// Create a local instance of cookie store, and bind to local context // Without this we get killed w/lots of threads, due to sync() on single cookie store.
            HttpContext localContext = new BasicHttpContext();
            CookieStore cookieStore = new BasicCookieStore();
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
            response = _httpClient.execute(getter, localContext);

The call to execute the GET request uses the localContext, which is what I think Jens want.

-- Ken


On Jan 27, 2010, at 3:22pm, Sam Crawford wrote:

I could well be mistaken, but my experience suggests that with version
4.0 you need a new HttpClient each time you deal with a different set
of cookies. Creating multiple HttpContexts used across a single
DefaultHttpClient instance did not seem to be sufficient.

That said, I only tried this briefly and didn't spend a huge amount of
time investigating it. I keep meaning to do so and to submit a bug if
I find a genuinely reproducible issue.

Thanks,

Sam


2010/1/27 Jens Mueller [email protected] <[email protected] >:
Hello HC Experts,

I would be very greatful for an advice regarding my question. I already spend a lot of time searching the internet, but I am still have not found an example that answers my questions. There are lot of examples available (also for the multithreaded use-cases) but the only adress the use-case making one(!!) request. I am completely uncertain how to "best" make a series of
requests (to the same webserver).

I need to develop a simple Crawler that crawls some websites for specific information. The Basic idea is to download the single webpages of a website (for example www.a.com) sequentially but run several of these "sequential" downloaders in threads for different webpages (www.b.com and www.c.com ) in
parallel.

My current concept/implementation looks like this:

1.  Instanciate a ThreadSafeClientConnManager (with a lot of default
parameters). This connection Manager will be used/shared by all
"DefaultHttpClient's"s
2. For every Webpage (of a Website, with multiple webpages), I Instanciate for every(!!) webpage-request a new DefaultHttpClient and then call the "httpClient.execute(httpGet)" method with the instanciated GetMethod(url).

==> I am more and more wondering if this is the correct usage of the
DefaultHttpClient and the .execute() Method. Am I doing something wrong here, to instanciate a new DefaultHttpClient for every request of a wepage? Or should I rather instanciate only one(!!) DefaultHttpClient and then share
this for the sequential .execute() calls?

To be honest, what I also have not really understood yet is the Cookie Management. Do I as the Programmer have to instanciate the CookieStore
manually
1. httpClient.setCookieStore(new BasicCookieStore());
and then after calling the .execute() method "get" the Cookie store
2. savedcookies = httpClient.getCookieStore()
and then reinject this cookie store for the next call to the same wepage (to
maintain state)?
3. httpClient.setCookie(savedcookies)
Or is there some implicit magic that A) does create the cookie store
implicitly and B) somehow shares this CookieStore among the HttpClients
and/or HttpGet's?

Thank you very much!!
Jens


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to