If we're receiving an error 115, could this mean that any timeout is kind
of irrelevant? The first attempt to create the socket immediately ends with
this code. 100ms later, per retry_interval, the same error is immediately
returned. The code at line 627 tests the return val from this line which
creates the socket:

ret = nb_connect(sd, (struct sockaddr *)addr,  , l);

The third argument is the timeout, which isn't even set. I see it's been
corrected in version 1.2.41, using the var connect_timeout. Since its blank
for us, could this have any effect?

Here are the same log lines bu this time with their timestamp (apologies
for not having this from the start):

[Wed Feb 03 10:15:01.014 2016] [119069:3584969584] [info]
jk_open_socket::jk_connect.c (627): connect to _hostip_port failed
(errno=115)
[Wed Feb 03 10:15:01.014 2016] [119069:3584969584] [info]
ajp_connect_to_endpoint::jk_ajp_common.c (992): Failed opening socket to
(_hostip_port) (errno=115)
[Wed Feb 03 10:15:01.014 2016] [119069:3584969584] [error]
ajp_send_request::jk_ajp_common.c (1621): (_hostname_port) connecting to
backend failed. Tomcat is probably not started or is listening on the wrong
port (errno=115)
[Wed Feb 03 10:15:01.014 2016] [119069:3584969584] [info]
ajp_service::jk_ajp_common.c (2614): (_hostname_port) sending request to
tomcat failed (recoverable), because of error during request sending
(attempt=1)
[Wed Feb 03 10:15:01.115 2016] [119069:3584969584] [info]
jk_open_socket::jk_connect.c (627): connect to _hostip_port failed
(errno=115)
[Wed Feb 03 10:15:01.115 2016] [119069:3584969584] [info]
ajp_connect_to_endpoint::jk_ajp_common.c (992): Failed opening socket to
(_hostip_port) (errno=115)
[Wed Feb 03 10:15:01.115 2016] [119069:3584969584] [error]
ajp_send_request::jk_ajp_common.c (1621): (_hostname_port) connecting to
backend failed. Tomcat is probably not started or is listening on the wrong
port (errno=115)
[Wed Feb 03 10:15:01.115 2016] [119069:3584969584] [info]
ajp_service::jk_ajp_common.c (2614): (_hostname_port) sending request to
tomcat failed (recoverable), because of error during request sending
(attempt=2)
[Wed Feb 03 10:15:01.115 2016] [119069:3584969584] [error]
ajp_service::jk_ajp_common.c (2634): (_hostname_port) connecting to tomcat
failed.
[Wed Feb 03 10:15:01.115 2016] [119069:3584969584] [info]
service::jk_lb_worker.c (1469): service failed, worker _hostname_port is in
local error state


We don't see this issue in any other environment, and we have several
environments all running the same configuration.


> This means 3 seconds connect timeout. So it seems either your network, an
intermediary between
> Apache and Tomcat, or Tomcat has a problem of allowing to establish a new
connection in 3 seconds.
> Although the connect is typically done by the backend OS, once the
app/TC/JVM gets slow in accepting
> new connections, the accept queue fills up and then new connects will
fail.

We've tried setting socket_connect_timeout to 10000 (10s) with no affect,
probably because our code ignore the timeout value as noted above. Do newer
versions fix this, and would this help us here? I'm guessing if no timeout
is passed then the call should waiting forever. We've conducted several
tests with the netcat command, establishing connections to the AJP port
from the Apache server with no errors or latency. While the problem is at
its worst, while traffic is at it's highest, these tests connect
immediately.

> Did you check whether you observe long GC pauses for your Tomcat JVMs?

There were no long pauses, I can see nothing abnormal here. GC behavior is
consistent when problem is introduced to when it is at it's worst. Thread
count in well below limits as well, no abnormal spikes.

I'm trying to create a safe thread dump to send, but there is too much
proprietary data still. Let me know if you'd like still this info if
helpful, and I'll keep pruning it.


On Fri, Mar 4, 2016 at 3:58 PM, Rainer Jung <rainer.j...@kippdata.de> wrote:

> Am 05.03.2016 um 00:09 schrieb Max Lynch:
>
>> Hi Rainer, I will do my best to provide those things.
>>
>> Here is what looks like the full sequence from the our log:
>>
>> [46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to
>> _ip_:12409 failed (errno=115)
>> [46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992):
>> Failed opening socket to (_ip_:12409) (errno=115)
>> [46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621):
>> (_hostname_) connecting to backend failed. Tomcat is probably not started
>> or is listening on the wrong port (errno=115)
>> [46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614):
>> (_hostname_)
>> sending request to tomcat failed (recoverable), because of error during
>> request sending (attempt=1)
>> [46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to
>> _ip_:12409 failed (errno=115)
>> [46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992):
>> Failed opening socket to (_ip_:12409) (errno=115)
>> [46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621):
>> (_hostname_) connecting to backend failed. Tomcat is probably not started
>> or is listening on the wrong port (errno=115)
>> [46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614):
>> (_hostname_)
>> sending request to tomcat failed (recoverable), because of error during
>> request sending (attempt=2)
>> [46055:3512666992] [error] ajp_service::jk_ajp_common.c (2634):
>> (_hostname_) connecting to tomcat failed.
>> [46055:3512666992] [info] service::jk_lb_worker.c (1469): service failed,
>> worker _hostname_ is in error state
>>
>
> OK, so errno 115 occurs in jk_connect.c line 627. The code there is
> expected to handle 115 but will not wait longer than socket_connect_timeout.
>
> You can see after this sequence the backend worker is marked as Bad.
>>
>> Here is the config:
>>
>> JkWorkerProperty worker.list=jkstatus,ajp_app,ajp_app2,ajp_app3,...
>> JkWorkerProperty worker.jkstatus.type=status
>> JkWorkerProperty worker.lb_member_template.type=ajp13
>> JkWorkerProperty worker.lb_member_template.activation=Active
>> JkWorkerProperty worker.lb_member_template.ping_mode=A
>> JkWorkerProperty worker.lb_member_template.connection_pool_timeout=600
>> JkWorkerProperty worker.lb_member_template.socket_keepalive=True
>> JkWorkerProperty worker.lb_member_template.socket_timeout=30
>>
>
> I usually recommend *not* to use the general socket_timeout. Remove it.
>
> I do suggest to set ping_timeout to e.g. 10 seconds (it is the default,
> but making it explicit kind of ducoments it in your config).
>
> JkWorkerProperty worker.lb_member_template.socket_connect_timeout=3000
>>
>
> This means 3 seconds connect timeout. So it seems either your network, an
> intermediary between Apache and Tomcat, or Tomcat has a problem of allowing
> to establish a new connection in 3 seconds. Although the connect is
> typically done by the backend OS, once the app/TC/JVM gets slow in
> accepting new connections, the accept queue fills up and then new connects
> will fail.
>
> Did you check whether you observe long GC pauses for your Tomcat JVMs?
> Enable a very verbose GC log and have a look.
>
> I typically avoid very short timeouts, so try setting
> socket_connect_timeout to 10000. If the root cause isn't just a short term
> hickup, this will not mitigate the problem but should result in slightly
> better stability in general.
>
> JkWorkerProperty worker.lb_member_template.recover_time=30
>> JkWorkerProperty worker.lb_member_template.recovery_options=7
>> JkWorkerProperty worker.lb_worker_template.type=lb
>> JkWorkerProperty worker.ajp_app.reference=worker.lb_worker_template
>> JkWorkerProperty worker.ajp_app.balance_workers=_hostname1_ajpport1,
>> _hostname1_ajpport2, ..., _hostname34_ajpport15
>> JkWorkerProperty
>> worker._hostname_ajpportX.reference=worker.lb_member_template
>> JkWorkerProperty worker._hostname_ajpportX.host=_hostname_
>> JkWorkerProperty worker._hostname_ajpportX.port=xxxx
>>
>
> Looks fine except for the socket_timeout and the very short
> socket_connect_timeout.
>
> will this list accept attachments for the other details such as netstat
>> output and thread dumps?
>>
>
> Not sure, but you can try. If it doesn't work, you can also send privately
> to me and I will summarize.
>
> If the problem persists for a long enough time, you can also try opening a
> new connection to your Tomcat AJP connector from the mod_jk machine by
> running a telnet from the mod_jk machine to the remote Tomcat port. If
> there is a longer hang, you should notice it with telnet as well. Of course
> you can't speak AJP using telnet, but it can show you whether there really
> is a connect problem.
>
> You might argue, that the problem disappears when you restart Apache. But
> that might be due to the fact, that then all existing connections get
> closed and resources (like threads) are freed on the Tomcat/JVM side. I
> suspect the question is, what keeps those resources busy. So let's have a
> look at netstat and thread dumps and probably try the little telnet
> experiment.
>
>
> Regards,
>
> Rainer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

Reply via email to