Hello Oleg, hello Ken, hello Sam,

thank your very much for your help!!!

Please allow me to ask one further question. In case the DefaultHttpClient
would be used on a "website-basis" (that is, I create a new Instance of the
DefaultHttpClient for downloading a specific website (www.a.com) and then
create a new DefaultHttpClient for a second website (www.b.com) and the
DefaultHttpClient is used with the ThreadSafeClientConnManager, do I have to
somehow  explicitly shutdown the DefaultHttpClient? (The JavaDoc states,
that when the DefaultHttpClient is used with NO explicitly set Connection
Manager, then getConnectionManager().shutdown() sould be called, as it
implicitly creates a SimpleConnectionManager). But is my assumption correct,
that when I use the TSCCM (with the DefaultHttpClient) that I then do not
have to do anything at all to leak any ressources (when I no longer require
the DefaultHttpClient instance). It seems that HttpClient is a very
heavy-object and maybe there are other resources I have to manually
"free/shutdown"?

(I very much appreciate your help and I started to refactor my application.
I then however had to realize that I have the requirement to have a
decidated UserAgent for every website I crawl. Using a "Shared
DefaultHttpClient" (one Instance for the whole application ) with dedicated
HttpContexts per Website/Thread doesn't work, as I sadly can't set the
UserAgent on the HttpContext level. The UserAgent only seems to be settable
on the HttpClient or HttpMethod Level. I dont know would this be a
reasonable feature request/suggestion to also allow HttpParams to be set on
the HttpContext level that then will take precidence over all other (already
specified) paramters?

Thank you very much!
Jens







2010/1/28 Oleg Kalnichevski <[email protected]>

> On Wed, 2010-01-27 at 20:42 +0100, Jens Mueller
> [email protected] wrote:
> > Hello HC Experts,
> >
> > I would be very greatful for an advice regarding my question. I already
> > spend a lot of time searching the internet, but I am still have not found
> an
> > example that answers my questions. There are lot of examples available
> (also
> > for the multithreaded use-cases) but the only adress the use-case making
> > one(!!) request. I am completely uncertain how to "best" make a series of
> > requests (to the same webserver).
> >
> > I need to develop a simple Crawler that crawls some websites for specific
> > information. The Basic idea is to download the single webpages of a
> website
> > (for example www.a.com) sequentially but run several of these
> "sequential"
> > downloaders in threads for different webpages (www.b.com and www.c.com)
> in
> > parallel.
> >
> > My current concept/implementation looks like this:
> >
> > 1.  Instanciate a ThreadSafeClientConnManager (with a lot of default
> > parameters). This connection Manager will be used/shared by all
> > "DefaultHttpClient's"s
> > 2.  For every Webpage (of a Website, with multiple webpages), I
> Instanciate
> > for every(!!) webpage-request a new DefaultHttpClient and then call the
> > "httpClient.execute(httpGet)" method with the instanciated
> GetMethod(url).
> >
> > ==> I am more and more wondering if this is the correct usage of the
> > DefaultHttpClient and the .execute() Method. Am I doing something wrong
> > here, to instanciate a new DefaultHttpClient for every request of a
> wepage?
> > Or should I rather instanciate only one(!!) DefaultHttpClient and then
> share
> > this for the sequential .execute() calls?
> >
> > To be honest, what I also have not really understood yet is the Cookie
> > Management. Do I as the Programmer have to instanciate the CookieStore
> > manually
> > 1. httpClient.setCookieStore(new BasicCookieStore());
> > and then after calling the .execute() method "get" the Cookie store
> > 2. savedcookies = httpClient.getCookieStore()
> > and then reinject this cookie store for the next call to the same wepage
> (to
> > maintain state)?
> > 3. httpClient.setCookie(savedcookies)
> > Or is there some implicit magic that A) does create the cookie store
> > implicitly and B) somehow shares this CookieStore among the HttpClients
> > and/or HttpGet's?
> >
> > Thank you very much!!
> > Jens
>
> Jens,
>
> Re-use HttpClient instance for all execution threads but create a
> separate HttpContext and CookieStore per thread of execution /
> individual user, as described by Ken.
>
> Oleg
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to