Something we should also check is the CPU load of the Tomcat instance.
May be it will be usefull also to let users/admin add their own
counters in the load estimation.

For example, if some admins considers we should base the
load-balancing on HTTP requests or SQL access, and they have these
counters on their webapp applications, it will be usefull to be able
to get them from Tomcat to send them back to jk balancer.

It shouldn't be too hard and very welcome for many Tomcat sites

2007/7/4, Rainer Jung <[EMAIL PROTECTED]>:
Hi,

implementing a management communication between the lb and the backend
is on the roadmap for jk3. It is somehow unlikely, that this will help
in your situation, because when doing a GC the jvm will no longer
respond to the management channel. Traditional Mark Sweep Compact GC is
not distinguishable from the outside from a halt in the backend. Of
course we could think of a webapp trying to use the JMX info on memory
consumption to estimate GC activity in advance, but I doubt that this
will be a stable solution. There are notifications, when GCs happen, but
at the moment I'm not sure, if such events exist, before, or only after
a GC.

I think a first step (and a better solution) would be to use modern GC
algorithms like Concurrent Mark Sweep, which will most of the time
reduce the GC stop times to some 10s or 100s of milliseconds (depending
on heap size). CMS comes with a cost, a little more memory needed and a
little more CPU needed, but the dramatically decreased stop times are
worth it. Also it is quite robust since about 1-2 years.

Other components will not like long GC pauses as well, like for instance
cluster replication. There you configure the longest pause you accept
for missing heartbeat packets before assuming a node is dead. Assuming a
node being dead because of GC pauses and then the node suddenly works
without having noticed itself that it outer world has changed is a very
bad situation too.

What we plan as a first step for jk3 is putting mod_jk on the basis of
the apache APR libraries. Then we can relatively easily use our own
management threads to monitor the backend status and influence the
balancing decisions. As long as we do everything on top of the request
handling threads we can't do complex things in a stable way.

Getting jk3 out of the door will take some longer time (maybe 6 to 12
months'for a release). People willing to help are welcome.

Concerning the SLAs: it always makes sense to put a percentage limit on
the maximum response times and error rates. A 100% below some limit
clause will always be to expensive. But of course, if you can't reduce
GC times and the GC runs to often, there will be no acceptable
percentage for long running requests.

Thank you for sharing your experiences at Langen with us!

Regards,

Rainer

Yefym Dmukh wrote:
> Hi all,
> sorry for the stress but it seems that it is a time to come back to  the
> discussion related to the load balancing for JVM (Tomcat).
>
> Prehistory:
> Recently we made benchmark and smoke tests of our product at the sun high
> tech centre in Langen (Germany).
>
> As the webserver apache2.2.4 has been used, container -10xTomcat 5.5.25
> and as load balancer - JK connector 1.2.23 with busyness algorithm.
>
>         Under the high load the strange behaviour was  observed: some
> tomcat workers temporary got the non-proportional load, often 10 times
> higher then the others for the relatively long periods.  As the result the
> response times that usually stay under 500ms went up to 20+ sec, that in
> its turn  made the overall test results almost two time worst as
> estimated.
>
>                 At the beginning we were quite confused, because we were
> sure that it was not the problem of JVM configuration and supposed that
> the reason is in LB logic of mod_jk, and the both suggestions were right.
>
> Actually the following was happening: the LB sends requests and gets the
> session sticky, continuously sending the upcoming requests to the same
> cluster node. At the certain period of time the JVM started the major
> garbage collection (full gc) and spent, mentioned above, 20 seconds. At
> the same time jk continued to send new requests and the sticky to node
> requests that led us to the situation where the one node broke the SLA on
> response times.
>
> I ^ve been searching the web for awhile to find the LoadBalancer
> implementation that takes an account the GC activity and reduces the load
> accordingly case JVM is close to the major collection, but nothing found.
>
> Once again the LB of JVMs under the load is really an issue for production
> and with optimally distributed load you are able not only to lower the
> costs, but also able to prevent bad customer experience, not to mention
> broken SLAs.
>
> Feature request:
>
>         All lb algorithms have to be extended with the bidirectional
> connection with jvm:
>              Jvm -> Lb: old gen size and the current occupancy
>          Lb -> Jvm: prevent node overload and advice gc on dependent on
> parameterized free old gen space in %.
>
>
> All the ideas and comments are appreciated.
>
> Regards,
> Yefym.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to