Am 05.03.2016 um 00:09 schrieb Max Lynch:
Hi Rainer, I will do my best to provide those things.
Here is what looks like the full sequence from the our log:
[46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to
_ip_:12409 failed (errno=115)
[46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992):
Failed opening socket to (_ip_:12409) (errno=115)
[46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621):
(_hostname_) connecting to backend failed. Tomcat is probably not started
or is listening on the wrong port (errno=115)
[46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614): (_hostname_)
sending request to tomcat failed (recoverable), because of error during
request sending (attempt=1)
[46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to
_ip_:12409 failed (errno=115)
[46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992):
Failed opening socket to (_ip_:12409) (errno=115)
[46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621):
(_hostname_) connecting to backend failed. Tomcat is probably not started
or is listening on the wrong port (errno=115)
[46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614): (_hostname_)
sending request to tomcat failed (recoverable), because of error during
request sending (attempt=2)
[46055:3512666992] [error] ajp_service::jk_ajp_common.c (2634):
(_hostname_) connecting to tomcat failed.
[46055:3512666992] [info] service::jk_lb_worker.c (1469): service failed,
worker _hostname_ is in error state
OK, so errno 115 occurs in jk_connect.c line 627. The code there is
expected to handle 115 but will not wait longer than socket_connect_timeout.
You can see after this sequence the backend worker is marked as Bad.
Here is the config:
JkWorkerProperty worker.list=jkstatus,ajp_app,ajp_app2,ajp_app3,...
JkWorkerProperty worker.jkstatus.type=status
JkWorkerProperty worker.lb_member_template.type=ajp13
JkWorkerProperty worker.lb_member_template.activation=Active
JkWorkerProperty worker.lb_member_template.ping_mode=A
JkWorkerProperty worker.lb_member_template.connection_pool_timeout=600
JkWorkerProperty worker.lb_member_template.socket_keepalive=True
JkWorkerProperty worker.lb_member_template.socket_timeout=30
I usually recommend *not* to use the general socket_timeout. Remove it.
I do suggest to set ping_timeout to e.g. 10 seconds (it is the default,
but making it explicit kind of ducoments it in your config).
JkWorkerProperty worker.lb_member_template.socket_connect_timeout=3000
This means 3 seconds connect timeout. So it seems either your network,
an intermediary between Apache and Tomcat, or Tomcat has a problem of
allowing to establish a new connection in 3 seconds. Although the
connect is typically done by the backend OS, once the app/TC/JVM gets
slow in accepting new connections, the accept queue fills up and then
new connects will fail.
Did you check whether you observe long GC pauses for your Tomcat JVMs?
Enable a very verbose GC log and have a look.
I typically avoid very short timeouts, so try setting
socket_connect_timeout to 10000. If the root cause isn't just a short
term hickup, this will not mitigate the problem but should result in
slightly better stability in general.
JkWorkerProperty worker.lb_member_template.recover_time=30
JkWorkerProperty worker.lb_member_template.recovery_options=7
JkWorkerProperty worker.lb_worker_template.type=lb
JkWorkerProperty worker.ajp_app.reference=worker.lb_worker_template
JkWorkerProperty worker.ajp_app.balance_workers=_hostname1_ajpport1,
_hostname1_ajpport2, ..., _hostname34_ajpport15
JkWorkerProperty
worker._hostname_ajpportX.reference=worker.lb_member_template
JkWorkerProperty worker._hostname_ajpportX.host=_hostname_
JkWorkerProperty worker._hostname_ajpportX.port=xxxx
Looks fine except for the socket_timeout and the very short
socket_connect_timeout.
will this list accept attachments for the other details such as netstat
output and thread dumps?
Not sure, but you can try. If it doesn't work, you can also send
privately to me and I will summarize.
If the problem persists for a long enough time, you can also try opening
a new connection to your Tomcat AJP connector from the mod_jk machine by
running a telnet from the mod_jk machine to the remote Tomcat port. If
there is a longer hang, you should notice it with telnet as well. Of
course you can't speak AJP using telnet, but it can show you whether
there really is a connect problem.
You might argue, that the problem disappears when you restart Apache.
But that might be due to the fact, that then all existing connections
get closed and resources (like threads) are freed on the Tomcat/JVM
side. I suspect the question is, what keeps those resources busy. So
let's have a look at netstat and thread dumps and probably try the
little telnet experiment.
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org