You do have bunch of services that are http mode that don't seem to have any
type of http close. Some I don't understand why they are not http mode and
they probably should be.
Just a note you may be able to greatly simplify (and possibly speed up) your
config using the new capabilities for tables of IPs added in 1.4.6.
solr should probably be http mode and anywhere else that you have http mode
you probably want an http close option turned on.
I am not sure why they chose dispatch for the prod glassfish server, my guess
is they are running apache and mod_jk or something and then forwarding the
requests to different glassfish servers - are there really more than one prod
glassfish servers? I am wondering if the previous admin set up more than one
copy of haproxy and that is why several services are redirected to the same
machine - like glassfish prod there is no other reference to port 4850 in this
config, so what is running on port 4850? haproxy/apache/heaven forbid -
glassfish itself? netstat -antope | fgrep LIST | fgrep 4850
I think one of the problems is the "inter_server" it doesn't have http mode
set so if more than one hit/request comes in on an open connection then your
request parsing rules are not run on any requests except the first one (as
Wille keeps reminding people). That might work ok for most things since you
are mostly breaking things up by service liferay goes to the liferay servers,
etc - the problem comes in if you have a portal that people sign into and then
have a menu/navbar that they can choose different services that should be
going to different front/backends.
On 5/18/10 3:49 PM, Chih Yin wrote:
On Mon, May 17, 2010 at 11:11 PM, Hank A. Paulson
<[email protected] <mailto:[email protected]>> wrote:
On 5/17/10 10:24 PM, Willy Tarreau wrote:
On Mon, May 17, 2010 at 07:42:03PM -0700, Hank A. Paulson wrote:
I have some sites running a similar set up - Xen domU,
keepalived,
fedora not RHEL and they get 50+ million hits per day with
pretty
fast response. you might want to use the "log separate
errors" (sp?)
option and review those 50X errors carefully, you might see
a pattern
- do you have http-close* in all you configs? That got me
weird, slow
results when I missed it once.
Indeed, that *could* be a possibility if combined with a server
maxconn
because connections would be kept for a long time on the server
(waiting
for either the client or the server to close) and during that
time nobody
else could connect. The typical problem with keep-alive to the
servers in
fact. The 503 could be caused by requests waiting too long in
the queue
then.
My example was just to assure Chin Yin that haproxy on xen should be
able to handle his current load depending, of course, on the
glassfish servers.
I meant some kind of httpclose option
(httpclose/forceclose/http-server-close/etc) turned on regardless of
keep-alive status - you know, like you are always reminding people :)
I noticed when I forgot it on a section (that was not keepalive
related) it caused wacky results - hanging browsers,
images/icons/css not showing up, etc. Obviously it should not affect
single requests like you would assume Akamai would be sending, it
was a pure guess.
Thank you everyone for your feedback. I really appreciate your help.
Sorry for taking so long to respond. I had to get permission from my
director to post some of the log data and our haproxy configuration
file. I also had to hide a bit more of the configuration than was
suggested because of concerns about making the issues we're encountering
too public. I hope you understand.
From my research on HAProxy and high availability websites in general,
it seemed to me that compared to other websites, our traffic volume is
actually rather light. In addition to how we have configured HAProxy
for our infrastructure, I'm definitely also taking a look at our
application servers and our content as well.
I started looking at the log files and the HAProxy configuration file
more closely today.
I attached the (poorly) cleaned HAProxy configuration file. Looking at
it, I can already see that the httpclose option isn't consistently
included in all the sections, both the frontend and the backend. I will
make sure this option is in all sections. Should I also add this to the
global settings for HAProxy? Is it okay if this option is listed more
than once in a section (I noticed that this happened a couple of times)?
Chin Yin, Xani was right, please take a look at your logs. Also,
sending
us your config would help a lot. Replace IP addresses and
passwords with
"XXX" if you want, we'll comment on the rest. BTW you should
tell your
admin that 1.3.21 has an annoying bug which makes it crash when
connecting
to the stats socket. Thus, this reduces your possibilities of
debugging it.
When you have some time, you should upgrade it to 1.3.22 or
later (1.3.24)
which fix a small number of remaining bugs.
example stats page screenshot attached.
Nice stats Hank :-)
That is just the page frames (mostly) not including images, css, js,
static icons or any other "stuff" but neither is it just for one
day, it is longer.
I have already reported to my director to let him know that we really
need to upgrade to 1.3.22 or later.
As for the logs, it seems that I'll need to look at the configuration
for HAProxy a bit more to make some adjustments first. A few months
back, I know I saw messages indicating the status of server (e.g. 3
active, 2 backup). I also see messages when the HAProxy configuration
was reloaded or when HAProxy was restarted. I no longer see these
status messages in the log files.
That is a good reason to turn on the log separate errors option - the error go
into both log files but it is easier to review the error log without all the
normal accesses. It doesnt realy add any load, just makes life easier.
> I recall that the system
administrator who initially configured HAProxy mentioned that he removed
the logging of some inter-server traffic to make the log file sizes
smaller. I'm wondering if he also removed these status messages as well.
Maybe, that would be surprising since those msgs should infrequent and are
somewhat important - It is more probable that they adjusted the apache logging
(for example on the cas servers) to not log the hits to /security/check.txt
given that you are hitting it all the cas servers every 7 seconds so those
start to add up if your real reaffic is low.
option httpchk HEAD /security/check.txt HTTP/1.0
Again, thank you all for your help and suggestions.
C.Y.
Cheers,
Willy