Am 05.03.2016 um 00:09 schrieb Max Lynch:
Hi Rainer, I will do my best to provide those things.

Here is what looks like the full sequence from the our log:

[46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to
_ip_:12409 failed (errno=115)
[46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992):
Failed opening socket to (_ip_:12409) (errno=115)
[46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621):
(_hostname_) connecting to backend failed. Tomcat is probably not started
or is listening on the wrong port (errno=115)
[46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614): (_hostname_)
sending request to tomcat failed (recoverable), because of error during
request sending (attempt=1)
[46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to
_ip_:12409 failed (errno=115)
[46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992):
Failed opening socket to (_ip_:12409) (errno=115)
[46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621):
(_hostname_) connecting to backend failed. Tomcat is probably not started
or is listening on the wrong port (errno=115)
[46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614): (_hostname_)
sending request to tomcat failed (recoverable), because of error during
request sending (attempt=2)
[46055:3512666992] [error] ajp_service::jk_ajp_common.c (2634):
(_hostname_) connecting to tomcat failed.
[46055:3512666992] [info] service::jk_lb_worker.c (1469): service failed,
worker _hostname_ is in error state

OK, so errno 115 occurs in jk_connect.c line 627. The code there is expected to handle 115 but will not wait longer than socket_connect_timeout.

You can see after this sequence the backend worker is marked as Bad.

Here is the config:

JkWorkerProperty worker.list=jkstatus,ajp_app,ajp_app2,ajp_app3,...
JkWorkerProperty worker.jkstatus.type=status
JkWorkerProperty worker.lb_member_template.type=ajp13
JkWorkerProperty worker.lb_member_template.activation=Active
JkWorkerProperty worker.lb_member_template.ping_mode=A
JkWorkerProperty worker.lb_member_template.connection_pool_timeout=600
JkWorkerProperty worker.lb_member_template.socket_keepalive=True
JkWorkerProperty worker.lb_member_template.socket_timeout=30

I usually recommend *not* to use the general socket_timeout. Remove it.

I do suggest to set ping_timeout to e.g. 10 seconds (it is the default, but making it explicit kind of ducoments it in your config).

JkWorkerProperty worker.lb_member_template.socket_connect_timeout=3000

This means 3 seconds connect timeout. So it seems either your network, an intermediary between Apache and Tomcat, or Tomcat has a problem of allowing to establish a new connection in 3 seconds. Although the connect is typically done by the backend OS, once the app/TC/JVM gets slow in accepting new connections, the accept queue fills up and then new connects will fail.

Did you check whether you observe long GC pauses for your Tomcat JVMs? Enable a very verbose GC log and have a look.

I typically avoid very short timeouts, so try setting socket_connect_timeout to 10000. If the root cause isn't just a short term hickup, this will not mitigate the problem but should result in slightly better stability in general.

JkWorkerProperty worker.lb_member_template.recover_time=30
JkWorkerProperty worker.lb_member_template.recovery_options=7
JkWorkerProperty worker.lb_worker_template.type=lb
JkWorkerProperty worker.ajp_app.reference=worker.lb_worker_template
JkWorkerProperty worker.ajp_app.balance_workers=_hostname1_ajpport1,
_hostname1_ajpport2, ..., _hostname34_ajpport15
JkWorkerProperty
worker._hostname_ajpportX.reference=worker.lb_member_template
JkWorkerProperty worker._hostname_ajpportX.host=_hostname_
JkWorkerProperty worker._hostname_ajpportX.port=xxxx

Looks fine except for the socket_timeout and the very short socket_connect_timeout.

will this list accept attachments for the other details such as netstat
output and thread dumps?

Not sure, but you can try. If it doesn't work, you can also send privately to me and I will summarize.

If the problem persists for a long enough time, you can also try opening a new connection to your Tomcat AJP connector from the mod_jk machine by running a telnet from the mod_jk machine to the remote Tomcat port. If there is a longer hang, you should notice it with telnet as well. Of course you can't speak AJP using telnet, but it can show you whether there really is a connect problem.

You might argue, that the problem disappears when you restart Apache. But that might be due to the fact, that then all existing connections get closed and resources (like threads) are freed on the Tomcat/JVM side. I suspect the question is, what keeps those resources busy. So let's have a look at netstat and thread dumps and probably try the little telnet experiment.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to