You can create a local context and use that for all requests to the
same server. This then lets you re-use the same HttpClient, which is
how you want to handle this (versus creating new instances for each
domain).
For example, in Bixo's SimpleHttpFetcher there's this code:
getter = new HttpGet(new URI(url));
// Create a local instance of cookie store, and bind to
local context
// Without this we get killed w/lots of threads, due to
sync() on single cookie store.
HttpContext localContext = new BasicHttpContext();
CookieStore cookieStore = new BasicCookieStore();
localContext.setAttribute(ClientContext.COOKIE_STORE,
cookieStore);
response = _httpClient.execute(getter, localContext);
The call to execute the GET request uses the localContext, which is
what I think Jens want.
-- Ken
On Jan 27, 2010, at 3:22pm, Sam Crawford wrote:
I could well be mistaken, but my experience suggests that with version
4.0 you need a new HttpClient each time you deal with a different set
of cookies. Creating multiple HttpContexts used across a single
DefaultHttpClient instance did not seem to be sufficient.
That said, I only tried this briefly and didn't spend a huge amount of
time investigating it. I keep meaning to do so and to submit a bug if
I find a genuinely reproducible issue.
Thanks,
Sam
2010/1/27 Jens Mueller [email protected] <[email protected]
>:
Hello HC Experts,
I would be very greatful for an advice regarding my question. I
already
spend a lot of time searching the internet, but I am still have not
found an
example that answers my questions. There are lot of examples
available (also
for the multithreaded use-cases) but the only adress the use-case
making
one(!!) request. I am completely uncertain how to "best" make a
series of
requests (to the same webserver).
I need to develop a simple Crawler that crawls some websites for
specific
information. The Basic idea is to download the single webpages of a
website
(for example www.a.com) sequentially but run several of these
"sequential"
downloaders in threads for different webpages (www.b.com and www.c.com
) in
parallel.
My current concept/implementation looks like this:
1. Instanciate a ThreadSafeClientConnManager (with a lot of default
parameters). This connection Manager will be used/shared by all
"DefaultHttpClient's"s
2. For every Webpage (of a Website, with multiple webpages), I
Instanciate
for every(!!) webpage-request a new DefaultHttpClient and then call
the
"httpClient.execute(httpGet)" method with the instanciated
GetMethod(url).
==> I am more and more wondering if this is the correct usage of the
DefaultHttpClient and the .execute() Method. Am I doing something
wrong
here, to instanciate a new DefaultHttpClient for every request of a
wepage?
Or should I rather instanciate only one(!!) DefaultHttpClient and
then share
this for the sequential .execute() calls?
To be honest, what I also have not really understood yet is the
Cookie
Management. Do I as the Programmer have to instanciate the
CookieStore
manually
1. httpClient.setCookieStore(new BasicCookieStore());
and then after calling the .execute() method "get" the Cookie store
2. savedcookies = httpClient.getCookieStore()
and then reinject this cookie store for the next call to the same
wepage (to
maintain state)?
3. httpClient.setCookie(savedcookies)
Or is there some implicit magic that A) does create the cookie store
implicitly and B) somehow shares this CookieStore among the
HttpClients
and/or HttpGet's?
Thank you very much!!
Jens
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g