Re: Best balance between performance and resource usage

Alex Talis Sat, 24 Feb 2007 15:02:46 -0800

Hi Roland

Thanks a lot for your quick response. You've clarified things.

Our main problem was that the CentralServer was getting BindExceptions and 
Clients were disconnecting. This was happening because the server was running 
out of sockets. I realized that we were doing it all wrong by creating 
individual HttpClients for each request and not using a single MTHCM. Below are 
the settings I'm going to use. Please tell me if I'm wrong about any. It's 
worth mentioning that all servers run in dedicated Tomcats, and the HTTP 
connector is configured with maxThreads="1000", so supposedly 1000 concurrent 
requests can be handled. Server hosts are beefy multi-processor systems with at 
least 2Gb of memory, and Tomcat is given 512m. Network connections are usually 
100Mbps.

MaxHostConnections = 1000
MaxTotalConnections = 1000
CloseIdleConnectionsPeriod = 1 minute
IdleConnectionTimeout = 3 minutes
DeleteClosedConnectionsPeriod = 10 minutes

I decided to occasionally delete closed connections, just to be on the safe 
side. I ran the system overnight without any incoming connections, and the pool 
stayed at the max size it reached. Netstat did show that there were no open 
sockets, but it looks like HttpConnections never got deleted. I'll test it a 
bit more, maybe I'm missing something.

In case you're completely bummed about why we have this silly architecture and 
have absolutely nothing else to do on you weekend :)  I wanted to give a better 
description of what the system does. Feel free to ignore this!

All clients and servers are part of the same enterprise on the same intranet, 
and there are no limits imposed on the number of connections. The remote 
servers are not there to share load - they are actually at different 
geographical locations and allow users to track what's going on at that site. 
If a site in Sydney, Australia is in busy production time, most clients want to 
monitor that site and that's why most of the connections will go out to that 
site, while the other servers may be pretty idle. Clients can track only one 
site at a time by design. These are rich, thick clients, and have to display a 
great deal of information. It's simpler for the user to concentrate on one 
server, but they can look at any of the servers by selecting a different one 
from the list. Depending on the user's job function, they may want to track 
multiple servers, and in that case they can open more than one client on the 
same host and select a different server in each one.

Thank You

Alex

Roland Weber <[EMAIL PROTECTED]> wrote: Hello Alex,

please apologize that I won't go into all the details.

> Here's how my application uses the library. I may have up to 100 Applet
> clients running at the same time on various hosts. These clients need to
> display information from different servers, but because applets can only
> connect back to the server from which they were loaded, they ask one
> central server to give them data from the remote server they're interested
> in.

Unsigned applets can only connect to the server they come from.
IIRC, signed applets can do more. Applet signing certificates are
not cheap to come by, but if it saves you from implementing a complex
proxy on the server and buying bigger server hardware for the additional
load, it might still be worthwhile.

> Applets use HttpClient to connect to CentralServer, and include the URI
> of the remote server from which they need to get data as part of the
> request. CentralServer uses a static HttpClient instance to pass the
> Applet's request on to the other servers. There may be up to 20 such remote
> servers. Each client can manage only one remote server at a time, so it'll
> connect only to that one server.

Why that? HttpClient has no such restriction. Is this a problem of
your environment?

> Each client can, however, make multiple
> concurrent requests. So if 20 clients all decide to look at the same server
> and make 5 concurrent requests each, the CentralServer will get hit with
> 100 requests, all for the same remote server. Of course, the other clients
> will still keep asking for data from other servers.

So instead of passing requests to as many servers as possible,
you push one server into overload and let the other 19 run idle?
Maybe I'm missing some piece of the puzzle, but this sounds like
a very inefficient way of managing the workload.

> I'm trying to figure out the best way of using HttpClient,
> MultiThreadedHttpConnectionManager, pool sizes, and HostConfigurations to
> make my CentralServer (the component that sits in the middle and
> distributes requests from clients to remote servers) as efficient as
> possible. I've been through tutorials and mailing list archives, but I
> still can't quite figure out all the relationships between these concepts.

I'll try to summarize the ideas. Forget about HostConfiguration,
you typically use it only to configure a proxy. The objects are
very lightweight, you can not save significant time there.

Take 1 HttpClient with 1 MultiThreadedHttpConnectionManager (MTHCM).
If there are cookies coming from the servers, you will have to create
and keep a separate HttpState for every client your CentralServer.
Or you use an empty HttpState for every request and throw it away
afterwards. Don't share HttpState between different client sessions.

MTHCM has two limits you can adjust. MaxTotalConnections limits the
number of outgoing connections in total. You choose that limit based
on:
- the number of sockets you want to have open
- the number of service threads in CentralServer
- available network bandwidth
- other resource limits on the machine running CentralServer

MaxConnectionsPerHost limits the number of outgoing connections to
a single server. You can only set a common limit for all servers.
HTTP specification requires that no user agent opens more than 2
simultaneous connections to a single host. Proxies, such as your
CentralServer, are allowed to open 2 simultaneous connections for
each client that tries to reach a server.
If you're in a closed environment, and all participants (above all
the operators of the servers you are connecting to) agree, you can
of course ignore such limits.
You choose the MaxConnectionsPerHost limits based on the capacity
of the servers you are connecting to. If you know that server X
has only 10 service threads, there is no point in sending 100
requests there at the same time. You'd allow 10, or maybe 20 to
avoid round trip latency, but no more. Clients should better be
blocked in CentralServer and leave the remaining (total) connections
available for requests to the other 19 servers.

> In CentralServer, all http requests are made through the one static
> instance of HttpClient. CentralServer creates a new PostMethod for every
> request. Code from "CentralServer" servlet is below, and after that I ask
> specific questions.

Sorry, I'm not in the mood for code reviews.

> Questions:
> 
> 1. Since I have a finite set of remote URIs, does it buy me anything to
> create HostConfiguration objects for each server and use them when I make
> requests?

No.

> 2. Since I have a finite number of servers, does it make sense to use
> per-host connection pool size? Or is it just as good to have one big pool
> of connections?

One HttpClient means one MTHCM means one pool.
One big pool is better than individual pools.

> 3. Is the following statement true:  *
>  must be <= MaxTotalConnections?

No. In your scenario, #hosts * MaxConnPerHost is the maximum number
of connections you could have open. Limits are there to _reduce_
that number, in order to avoid overload situations. It is better
to process some request within load limits and keep the others
waiting than to overload the machine. Do you have enough service
threads in the first place to open that many connections? If so,
you should reduce that number, with 100 requests and 20 servers at
the same time, you'll be overloading CentralServer almost surely.

> 4. Can I set MaxHostConnections to 100 and MaxTotalConnections to 200 and
> still connect to 20 hosts? Will the connections shift from connection pool
> for one host to connection pool of another as needed?

Yes, depending on the requests coming in. You can have 10 connections to
each of the 20 hosts. You can have 100 connections to one host and split
the remaining 100 between the other 19 host. But 100 connections per host
sounds like a very high number to me. There is no point in opening those
connections if the requests will be queued up at that host. If that is
the case, it would be better to queue them at CentralServer, so that the
fewer connections can be re-used for sending the other requests when the
server is ready to server them.
There is only one connection pool. Yes, connections will be reassigned
from one host to another in that pool.

> 5. Is the purpose of MaxHostConnections [...]

see above

> 6. I wanted to monitor pool size, so I periodically print the value of
> httpConnectionManager.getConnectionsInPool(), but I noticed that it never
> shrinks, even though I'm running the IdleConnectionTimeoutThread. I figured
> out that to shrink it, I have to call
> httpConnectionManager.deleteClosedConnections().

Check the timeout setting. Check the workload. You don't need to call
deleteClosedConnections(), it is called by closeIdleConnections().

> (a) Do calls to closeIdleConnections completely release all system
> resources used by HttpConnection objects it closes? Meaning sockets, of
> course.

No. The sockets get closed, that's as much as we can do. We have had
reports that sockets can still hang around on the operating system
level in a CLOSED_WAIT state. Don't know what that means, I'm not a
TCP/IP expert.

> (b) Is there a way to determine how many of the connections in a pool are
> closed (for tracking purposes)? I mean, an existing way, other than
> extending some class in the library?

None that I know of. We'll add monitoring in HttpConn 4 or 5,
sooner or later ;-)

> (c) Is it a good idea to call
> httpConnectionManager.deleteClosedConnections() once in a while? If an
> HttpConnection is not deleted, will it hold any resources, other than the
> memory it occupies?

See above. No.

> (d) Are there advantages to keeping closed connections in the pool? Is it
> faster to open an existing closed connection then to create a new one and
> open it?

No. They will be thrown away anyway.

hope that helps,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best balance between performance and resource usage

Reply via email to