RE: 21 second pause that randomly happens

2018-07-20 Thread David Cleary
Thanks for the feedback. I was confident all along this wasn't a Tomcat issue, 
and I believe we identified the culprit as the firewall. I was looking for 
logging that would show the socket behavior (accept, close, etc.) to show that 
as far as Tomcat was concerned, the delay was happening before it got there. 
Anyways, we were able to do some testing without the particular firewall they 
were using and the results were encouraging. Will be a few days of testing 
before they try it on their production system to confirm it, but so far, so 
good.

Thanks
Dave

-Original Message-
From: charlie arehart [mailto:charlie_li...@carehart.org] 
Sent: Thursday, July 19, 2018 12:24 PM
To: 'Tomcat Users List' 
Subject: RE: 21 second pause that randomly happens

David? Are you still needing help with this? Your answers to what I ask may 
help us get you to a solution.

/charlie

-Original Message-
From: charlie arehart  
Sent: Tuesday, July 17, 2018 01:14 PM
To: 'Tomcat Users List' 
Subject: RE: 21 second pause that randomly happens

Going back to David's original note, you wrote, " We have a customer who is 
experiencing a random, 21 second pause when using out Tomcat based application 
server. We believe this may be during a TCP connect and timeout. Logging 
indicates the pause happens before the request makes it to our back end."

Can you clarify what you mean by the "back end" here? 

Since you say it's a pause "when using" Tomcat, I can't tell if you're saying 
Tomcat IS the backend, or instead if the request hangs WHILE running in Tomcat 
but doesn't reach some other "backend" that the Tomcat-based app should then 
talk to. You did refer in a later note to a "backend agent log", so I'm 
inclined to think it's the latter.

In that case, it seems you are wondering (in one note) whether the request 
might be hung up in the connector (getting TO Tomcat). 

But do you know for sure whether the request is actually running IN Tomcat? You 
could use any of various JVM or Tomcat monitoring tools to know a) if the 
request reaches Tomcat, b) if it is hanging while running there, and then c) 
just WHAT the request is hanging and waiting for (which could well be that 
"other backend" the app needs to talk to). 

But before elaborating on approaches, I wanted to ask what you may or may not 
know in regard to the above, first. :-)

/charlie 





Re: 21 second pause that randomly happens

2018-07-19 Thread James H. H. Lampert

Could it be related to garbage collection?

--
JHHL

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: 21 second pause that randomly happens

2018-07-19 Thread charlie arehart
David? Are you still needing help with this? Your answers to what I ask may 
help us get you to a solution.

/charlie

-Original Message-
From: charlie arehart  
Sent: Tuesday, July 17, 2018 01:14 PM
To: 'Tomcat Users List' 
Subject: RE: 21 second pause that randomly happens

Going back to David's original note, you wrote, " We have a customer who is 
experiencing a random, 21 second pause when using out Tomcat based application 
server. We believe this may be during a TCP connect and timeout. Logging 
indicates the pause happens before the request makes it to our back end."

Can you clarify what you mean by the "back end" here? 

Since you say it's a pause "when using" Tomcat, I can't tell if you're saying 
Tomcat IS the backend, or instead if the request hangs WHILE running in Tomcat 
but doesn't reach some other "backend" that the Tomcat-based app should then 
talk to. You did refer in a later note to a "backend agent log", so I'm 
inclined to think it's the latter.

In that case, it seems you are wondering (in one note) whether the request 
might be hung up in the connector (getting TO Tomcat). 

But do you know for sure whether the request is actually running IN Tomcat? You 
could use any of various JVM or Tomcat monitoring tools to know a) if the 
request reaches Tomcat, b) if it is hanging while running there, and then c) 
just WHAT the request is hanging and waiting for (which could well be that 
"other backend" the app needs to talk to). 

But before elaborating on approaches, I wanted to ask what you may or may not 
know in regard to the above, first. :-)

/charlie 



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: 21 second pause that randomly happens

2018-07-17 Thread charlie arehart
Going back to David's original note, you wrote, " We have a customer who is 
experiencing a random, 21 second pause when using out Tomcat based application 
server. We believe this may be during a TCP connect and timeout. Logging 
indicates the pause happens before the request makes it to our back end."

Can you clarify what you mean by the "back end" here? 

Since you say it's a pause "when using" Tomcat, I can't tell if you're saying 
Tomcat IS the backend, or instead if the request hangs WHILE running in Tomcat 
but doesn't reach some other "backend" that the Tomcat-based app should then 
talk to. You did refer in a later note to a "backend agent log", so I'm 
inclined to think it's the latter.

In that case, it seems you are wondering (in one note) whether the request 
might be hung up in the connector (getting TO Tomcat). 

But do you know for sure whether the request is actually running IN Tomcat? You 
could use any of various JVM or Tomcat monitoring tools to know a) if the 
request reaches Tomcat, b) if it is hanging while running there, and then c) 
just WHAT the request is hanging and waiting for (which could well be that 
"other backend" the app needs to talk to). 

But before elaborating on approaches, I wanted to ask what you may or may not 
know in regard to the above, first. :-)

/charlie 



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 21 second pause that randomly happens

2018-07-17 Thread tomcat

Some additional comments in the text below.

But as a general comment : neither tomcat nor your application seem to log any error. This 
suggest that when a connection is established by the client, and it sends a request to 
tomcat on that conection, it does get processed without error (and apparently without an 
extraordinary long delay). If there was a problem at the tomcat level either reading the 
request, or processing it, or sending the response to the wire (from the tomcat point of 
view), then you would see errors in those logs.

(Such as "client broke connection" or "timeout while reading request" or 
similar).

This all suggests that the problem is indeed down either at the TCP/socket level on the 
tomcat server, or even at some other intermediate agent level.
If the socket Tomcat is using is a java socket, then it is at the JVM level that you need 
to look for logging capabilities. If it is a "Tomcat native" socket, then it would be at 
that level (because that is native code, not java).
(Otherwise said : tomcat will never be aware of, or have access to data related to, TCP 
packet transmission/retransmission issues; and even less your application)



On 16.07.2018 20:48, David Cleary wrote:

On 16.07.2018 16:35, David Cleary wrote:

2018-07-16 15:55 GMT+03:00 David Cleary :

We have a customer who is experiencing a random, 21 second pause when using out

Tomcat

based application server. We believe this may be during a TCP connect and 
timeout. Logging
indicates the pause happens before the request makes it to our back end.



Logging where then ?


Sorry for any formatting issues. I have a digest subscription which doesn't 
lend well to interactivity.

Clients are running on Windows machines. Server is running on AWS and Linux. 
There is a cloud firewall in between (pfSense). Do not have the details if they 
are running the cloud version available on AWS.

Client logging shows this:

[18/05/16@12:12:48.822+1000] P-006760 T-002372 1 4GL REVTrying 
Connection
[18/05/16@12:13:09.925+1000] P-006760 T-002372 1 4GL REVConnect 
Complete 21102
[18/05/16@12:13:09.925+1000] P-006760 T-002372 1 4GL REVWARNING: 
LONG CONNECTION


From the point of view of the client (low-level), the TCP connection is with the 
front-end firewall/load balancer. The firewall has a separate TCP connection with the 
back-end server, and copies packets between these two connections, changing 
addresses/ports as required.
I do not know the client, and I guess tht it is possible that these "connection" messages 
relate to the logical connection with the application, rather than purely the TCP level.

But it sounds somewhat unlikely.
Do you have any way to re-configure this (for testing) in such a way that the client would 
bypass the firewall/load-balancer and connect directly to your application server ?

(and see if the issue happens also then)



[18/05/16@12:13:09.925+1000] P-006760 T-002372 1 4GL REV
A4DC513EA548E24508E1E90AA9EA61DD9386DDB475AD.clintons connected 21102

Localhost access log shows this

localhost_access_log.2018-05-16.txt:10.255.11.250 - - [16/May/2018:12:13:16 +1000] 
"POST /apsv?CONNHDL=A4DC513EA548E24508E1E90AA9EA61DD9386DDB475AD.clintons HTT 
P/1.1" 200 253 1



The access log line is written when the request is in effect terminated (processed) and 
the result has already been sent to the wire (that is e.g. how it can log the size of the 
response).  I think that if you look at that Access Log in the documentation, you will 
find tat you can log additional details, such as how much time it took to process the 
request e.g.

But so far, that does not seem to be relavant to the problem at hand.


And our back end agent log shows this:

clintons.agent.log:[18/05/16@12:13:16.294+1000] P-019364 T-2819262208 2 AS-19 
AS Application Server connected with connection id: 
A4DC513EA548E24508E1E90AA9EA61 DD9386DDB475AD.clintons. (8358)
clintons.agent.log:[18/05/16@12:13:16.299+1000] P-019364 T-3688318720 2 AS-19 
AS Application Server disconnected with connection id: 
A4DC513EA548E24508E1E90AA9E A61DD9386DDB475AD.clintons. (8359)



So again, no problem visible at that level.


Customer had some weird reconnection logic that was part of their application. 
After removing the code so the logical connection would be kept open, we saw 
this pause happen on a standard request. I do not know how long this logical 
connection was idle before running. I also do not know if Tomcat closed the 
underlying socket either due to resources or a keep-alive timeout. I was hoping 
logging could tell me when Tomcat binds to an incoming socket and releases it. 
I was hoping to show in the above example, as far as Tomcat is concerned, the 
21 second delay happened outside of the server. Scouring the source code and 
trying some experimentation, it does not appear there is logging available at 
the socket level.

   It mostly happens

when we create an initial logical connection, but we 

Re: 21 second pause that randomly happens

2018-07-16 Thread Erik G. H. Meade
If all else fails you might want to enable Garbage Colletion (GC) logging to 
make sure it isn't Full GCs causing the delay.

David Cleary  wrote:
> We have a customer who is experiencing a random, 21 second pause when using 
> out Tomcat based application server. We believe this may be during a TCP 
> connect and timeout. Logging indicates the pause happens before the request 
> makes it to our back end. It mostly happens when we create an initial logical 
> connection, but we have also seen it elsewhere where we believe the TCP Keep 
> alive was expired and a new socket had to be established. However, I do not 
> know this and am hoping there is some logging I can turn on in the NIO 
> connector to collect more data. I tried turning on logging in the Endpoint 
> class, but that did not provide anything useful. There is a NAT firewall 
> between the client and server, so I'm looking for some TCP level logging that 
> could point me in the proper direction.
> 
> Thanks
> David Cleary
> Progress
> 


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 21 second pause that randomly happens

2018-07-16 Thread David Cleary
On 16.07.2018 16:35, David Cleary wrote:
> 2018-07-16 15:55 GMT+03:00 David Cleary :
>>> We have a customer who is experiencing a random, 21 second pause when using 
>>> out
Tomcat
> based application server. We believe this may be during a TCP connect and 
> timeout. Logging
> indicates the pause happens before the request makes it to our back end.

> Logging where then ?

Sorry for any formatting issues. I have a digest subscription which doesn't 
lend well to interactivity.

Clients are running on Windows machines. Server is running on AWS and Linux. 
There is a cloud firewall in between (pfSense). Do not have the details if they 
are running the cloud version available on AWS.

Client logging shows this:

[18/05/16@12:12:48.822+1000] P-006760 T-002372 1 4GL REV    Trying 
Connection
[18/05/16@12:13:09.925+1000] P-006760 T-002372 1 4GL REV    Connect 
Complete 21102
[18/05/16@12:13:09.925+1000] P-006760 T-002372 1 4GL REV    WARNING: 
LONG CONNECTION
[18/05/16@12:13:09.925+1000] P-006760 T-002372 1 4GL REV    
A4DC513EA548E24508E1E90AA9EA61DD9386DDB475AD.clintons connected 21102

Localhost access log shows this

localhost_access_log.2018-05-16.txt:10.255.11.250 - - [16/May/2018:12:13:16 
+1000] "POST 
/apsv?CONNHDL=A4DC513EA548E24508E1E90AA9EA61DD9386DDB475AD.clintons HTT P/1.1" 
200 253 1

And our back end agent log shows this:

clintons.agent.log:[18/05/16@12:13:16.294+1000] P-019364 T-2819262208 2 AS-19 
AS Application Server connected with connection id: 
A4DC513EA548E24508E1E90AA9EA61 DD9386DDB475AD.clintons. (8358)
clintons.agent.log:[18/05/16@12:13:16.299+1000] P-019364 T-3688318720 2 AS-19 
AS Application Server disconnected with connection id: 
A4DC513EA548E24508E1E90AA9E A61DD9386DDB475AD.clintons. (8359)

Customer had some weird reconnection logic that was part of their application. 
After removing the code so the logical connection would be kept open, we saw 
this pause happen on a standard request. I do not know how long this logical 
connection was idle before running. I also do not know if Tomcat closed the 
underlying socket either due to resources or a keep-alive timeout. I was hoping 
logging could tell me when Tomcat binds to an incoming socket and releases it. 
I was hoping to show in the above example, as far as Tomcat is concerned, the 
21 second delay happened outside of the server. Scouring the source code and 
trying some experimentation, it does not appear there is logging available at 
the socket level.

  It mostly happens
> when we create an initial logical connection, but we have also seen it 
> elsewhere where
we
> believe the TCP Keep alive was expired and a new socket had to be 
> established. However,
I
> do not know this and am hoping there is some logging I can turn on in the NIO 
> connector
to
> collect more data. I tried turning on logging in the Endpoint class, but that 
> did not
provide
> anything useful.

> If the connection request does not even reach the Tomcat back-end, that is 
> also unlikely 
> to provide much information. (Not being facetious here, just stating a fact).
> Can you do a "netstat" command on your Tomcat server when this happens ?
> If yes, maybe some part of the output would provide some information from the 
> TCP level 
> (such as a high number of connections, to the Tomcat NIO port, in some 
> specific TCP state

e.g.)

Customer did some probing with Wireshark and said they were seeing a TCP 
retransmission (sorry, I do not have many more details). In investigating this, 
we discovered this info on TCP timeouts:

" There's probably a million reasons why the client may never receive a 
SYN-ACK. The one I've seen more often is packet loss, which in turn can have 
lots of reasons, for example a malfunctioning or misconfigured network switch.
However, you can immediately spot if your timeout/hang problems are caused by 
TCP retransmission because they happen to cause response times that are 
unusually frequently distributed around 3, 9 and 21 seconds (and on, of course).
In fact, the TCP retransmission timeout starts at 3 seconds, but if the client 
tries to resend after a timeout and still receives no answer, it doubles the 
wait to 6 s, so the total response time will be 9 seconds, assuming that the 
client now finally receives the SYN-ACK. Otherwise, 3 + 6 + 12 = 21, then 3 + 6 
+ 12 + 24 = 45 s and so on and so forth."

This is why we are focusing on the TCP layer.

>  There is a NAT firewall between the client and server, so I'm looking for
> some TCP level logging that could point me in the proper direction.
>
>> Tomcat version = ?
>
> Sorry. Tomcat 8.5.27.
>

> And on which kind of O.S. is this happening ?

> Also maybe another question : is this happening on a Tomcat server which is 
> dedicated to 
> that particular customer ? or is the Tomcat server shared between different 
> customers, and

> only that particular customer experiences these delays ?

We sell an application server that customers 

Re: 21 second pause that randomly happens

2018-07-16 Thread tomcat

On 16.07.2018 16:35, David Cleary wrote:

2018-07-16 15:55 GMT+03:00 David Cleary :

We have a customer who is experiencing a random, 21 second pause when using out 
Tomcat

based application server. We believe this may be during a TCP connect and 
timeout. Logging
indicates the pause happens before the request makes it to our back end.


Logging where then ?

 It mostly happens

when we create an initial logical connection, but we have also seen it 
elsewhere where we
believe the TCP Keep alive was expired and a new socket had to be established. 
However, I
do not know this and am hoping there is some logging I can turn on in the NIO 
connector to
collect more data. I tried turning on logging in the Endpoint class, but that 
did not provide
anything useful.


If the connection request does not even reach the Tomcat back-end, that is also unlikely 
to provide much information. (Not being facetious here, just stating a fact).

Can you do a "netstat" command on your Tomcat server when this happens ?
If yes, maybe some part of the output would provide some information from the TCP level 
(such as a high number of connections, to the Tomcat NIO port, in some specific TCP state 
e.g.)


 There is a NAT firewall between the client and server, so I'm looking for

some TCP level logging that could point me in the proper direction.


Tomcat version = ?


Sorry. Tomcat 8.5.27.



And on which kind of O.S. is this happening ?

Also maybe another question : is this happening on a Tomcat server which is dedicated to 
that particular customer ? or is the Tomcat server shared between different customers, and 
only that particular customer experiences these delays ?



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: 21 second pause that randomly happens

2018-07-16 Thread David Cleary
2018-07-16 15:55 GMT+03:00 David Cleary :
>> We have a customer who is experiencing a random, 21 second pause when using 
>> out Tomcat
based application server. We believe this may be during a TCP connect and 
timeout. Logging
indicates the pause happens before the request makes it to our back end. It 
mostly happens
when we create an initial logical connection, but we have also seen it 
elsewhere where we
believe the TCP Keep alive was expired and a new socket had to be established. 
However, I
do not know this and am hoping there is some logging I can turn on in the NIO 
connector to
collect more data. I tried turning on logging in the Endpoint class, but that 
did not provide
anything useful. There is a NAT firewall between the client and server, so I'm 
looking for
some TCP level logging that could point me in the proper direction.

>Tomcat version = ?

Sorry. Tomcat 8.5.27.

Dave

-



Re: 21 second pause that randomly happens

2018-07-16 Thread Konstantin Kolinko
2018-07-16 15:55 GMT+03:00 David Cleary :
> We have a customer who is experiencing a random, 21 second pause when using 
> out Tomcat based application server. We believe this may be during a TCP 
> connect and timeout. Logging indicates the pause happens before the request 
> makes it to our back end. It mostly happens when we create an initial logical 
> connection, but we have also seen it elsewhere where we believe the TCP Keep 
> alive was expired and a new socket had to be established. However, I do not 
> know this and am hoping there is some logging I can turn on in the NIO 
> connector to collect more data. I tried turning on logging in the Endpoint 
> class, but that did not provide anything useful. There is a NAT firewall 
> between the client and server, so I'm looking for some TCP level logging that 
> could point me in the proper direction.

Tomcat version = ?

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org