[ 
https://issues.apache.org/jira/browse/TS-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sudheer Vinukonda updated TS-3299:
----------------------------------
    Description: 
The patch in TS-3196 seems to result in fd leak in our prod. There are a bunch 
of hung sockets (in close-wait state), stuck forever (remain leaked for days 
after stopping the traffic). Debugging further, it seems that the inactivity 
audit is broken by this patch.

Some info below for the leaked sockets (in close_wait state):

{code}
$ ss -s ; sudo traffic_line -r
proxy.process.net.connections_currently_open; sudo traffic_line -r
proxy.process.http.current_client_connections; sudo traffic_line -r
proxy.process.http.current_server_connections; sudo ls -l /proc/$(pidof
traffic_server)/fd/ 3>/dev/null| wc -l
Total: 29367 (kernel 29437)
TCP:   78235 (estab 5064, closed 46593, orphaned 0, synrecv 0, timewait 15/0),
ports 918

Transport Total     IP        IPv6
*      29437     -         -        
RAW      0         0         0        
UDP      16        13        3        
TCP      31642     31637     5        
INET      31658     31650     8        
FRAG      0         0         0        

Password: 
27689
1
1
27939

A snippet from "lsof -p $(pidof traffic_server)"

[ET_NET 10024 nobody  240u  IPv4         2138575429        0t0        TCP 
67.195.33.62:443->66.87.139.114:29381 (CLOSE_WAIT)
[ET_NET 10024 nobody  241u  IPv4         2137093945        0t0        TCP 
67.195.33.62:443->201.122.209.180:49274 (CLOSE_WAIT)
[ET_NET 10024 nobody  243u  IPv4         2136018789        0t0        TCP 
67.195.33.62:443->173.225.111.9:38133 (CLOSE_WAIT)
[ET_NET 10024 nobody  245u  IPv4         2135996293        0t0        TCP 
67.195.33.62:443->172.243.79.180:52701 (CLOSE_WAIT)
[ET_NET 10024 nobody  248u  IPv4         2136468896        0t0        TCP 
67.195.33.62:443->173.225.111.82:42273 (CLOSE_WAIT)
[ET_NET 10024 nobody  253u  IPv4         2140213864        0t0        TCP 
67.195.33.62:443->174.138.185.120:34936 (CLOSE_WAIT)
[ET_NET 10024 nobody  259u  IPv4         2137861176        0t0        TCP 
67.195.33.62:443->76.199.250.133:60631 (CLOSE_WAIT)
[ET_NET 10024 nobody  260u  IPv4         2139081493        0t0        TCP 
67.195.33.62:443->187.74.154.214:58800 (CLOSE_WAIT)
[ET_NET 10024 nobody  261u  IPv4         2134948565        0t0        TCP 
67.195.33.62:443->23.242.49.117:4127 (CLOSE_WAIT)
[ET_NET 10024 nobody  262u  IPv4         2135708046        0t0        TCP 
67.195.33.62:443->66.241.71.243:50318 (CLOSE_WAIT)
[ET_NET 10024 nobody  263u  IPv4         2138896897        0t0        TCP 
67.195.33.62:443->73.35.151.106:52414 (CLOSE_WAIT)
[ET_NET 10024 nobody  264u  IPv4         2135589029        0t0        TCP 
67.195.33.62:443->96.251.12.27:62426 (CLOSE_WAIT)
[ET_NET 10024 nobody  265u  IPv4         2134930235        0t0        TCP 
67.195.33.62:443->207.118.3.196:50690 (CLOSE_WAIT)
[ET_NET 10024 nobody  267u  IPv4         2137837515        0t0        TCP 
67.195.33.62:443->98.112.195.98:52028 (CLOSE_WAIT)
[ET_NET 10024 nobody  269u  IPv4         2135272855        0t0        TCP 
67.195.33.62:443->24.1.230.25:57265 (CLOSE_WAIT)
[ET_NET 10024 nobody  270u  IPv4         2135820802        0t0        TCP 
67.195.33.62:443->24.75.122.66:14345 (CLOSE_WAIT)
[ET_NET 10024 nobody  271u  IPv4         2135475042        0t0        TCP 
67.195.33.62:443->65.102.35.112:49188 (CLOSE_WAIT)
[ET_NET 10024 nobody  272u  IPv4         2135328974        0t0        TCP 
67.195.33.62:443->209.242.195.252:54890 (CLOSE_WAIT)
[ET_NET 10024 nobody  273u  IPv4         2137542791        0t0        TCP 
67.195.33.62:443->76.79.183.188:47048 (CLOSE_WAIT)
[ET_NET 10024 nobody  274u  IPv4         2134806135        0t0        TCP 
67.195.33.62:443->189.251.149.36:58106 (CLOSE_WAIT)
[ET_NET 10024 nobody  275u  IPv4         2140126017        0t0        TCP 
67.195.33.62:443->68.19.173.44:1397 (CLOSE_WAIT)
[ET_NET 10024 nobody  276u  IPv4         2134636089        0t0        TCP 
67.195.33.62:443->67.44.192.72:22112 (CLOSE_WAIT)
[ET_NET 10024 nobody  278u  IPv4         2134708339        0t0        TCP 
67.195.33.62:443->107.220.216.155:51242 (CLOSE_WAIT)
[ET_NET 10024 nobody  279u  IPv4         2134580888        0t0        TCP 
67.195.33.62:443->50.126.116.209:59432 (CLOSE_WAIT)
[ET_NET 10024 nobody  281u  IPv4         2134868131        0t0        TCP 
67.195.33.62:443->108.38.255.44:4612 (CLOSE_WAIT)
[ET_NET 10024 nobody  282u  IPv4         2139275601        0t0        TCP 
67.195.33.62:443->168.99.78.5:63345 (CLOSE_WAIT)
[ET_NET 10024 nobody  283u  IPv4         2138019644        0t0        TCP 
67.195.33.62:443->209.242.204.18:47547 (CLOSE_WAIT)
[ET_NET 10024 nobody  284u  IPv4         2134873211        0t0        TCP 
67.195.33.62:443->99.47.247.0:58672 (CLOSE_WAIT)
[ET_NET 10024 nobody  285u  IPv4         2137679393        0t0        TCP 
67.195.33.62:443->174.138.184.227:43908 (CLOSE_WAIT)
[ET_NET 10024 nobody  287u  IPv4         2137417894        0t0        TCP 
67.195.33.62:443->50.46.200.8:4880 (CLOSE_WAIT)
[ET_NET 10024 nobody  288u  IPv4         2135541215        0t0        TCP 
67.195.33.62:443->216.201.190.60:41016 (CLOSE_WAIT)
[ET_NET 10024 nobody  289u  IPv4         2137315400        0t0        TCP 
67.195.33.62:443->98.180.195.225:53754 (CLOSE_WAIT)
[ET_NET 10024 nobody  291u  IPv4         2136931822        0t0        TCP 
67.195.33.62:443->12.199.206.210:7271 (CLOSE_WAIT)
[ET_NET 10024 nobody  292u  IPv4         2137892940        0t0        TCP 
67.195.33.62:443->166.173.60.100:16766 (CLOSE_WAIT)
[ET_NET 10024 nobody  293u  IPv4         2136627603        0t0        TCP 
67.195.33.62:443->71.131.5.198:51552 (CLOSE_WAIT)
[ET_NET 10024 nobody  294u  IPv4         2135059554        0t0        TCP 
67.195.33.62:443->209.95.183.181:39707 (CLOSE_WAIT)
{code}

Below are the traces showing how the inactivity audit is stuck in a loop, 
basically resetting the timer every time it expires, and not closing the vc.

{code}
[Jan 13 23:08:32.120] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
vc: 0x2ae574f13730 timeout at: 1421190525 timeout in: 300
[Jan 13 23:08:33.164] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
vc: 0x2ae574f13730 timeout at: 1421190525 timeout in: 300
.....
[Jan 13 23:13:46.833] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
[Jan 13 23:13:47.845] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
[Jan 13 23:08:47.639] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop) vc:
0x2ae574f13730 inactivity timeout not set, setting a default of 300
[Jan 13 23:08:48.666] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
........
{code}

Attached gdb to track who's resetting the inactivity timer - It seems that, 
when the inactivity timer expires for these connections, the audit correctly 
generates a signal to the vc's continuation. However, since, the patch in 
TS-3196 reset the read continuation to NULL, this event simply is ignored (in 
{{read_signal_and_update()}}, while at the same time, the inactivity timer is 
reset. This keeps looping forever resulting in never releasing these inactive 
connections. Below's the debug output from gdb.

{code}
(gdb) cont
Continuing.
Hardware watchpoint 22: *(0x2ae564011a28)

Old value = 0
New value = -794113713
UnixNetVConnection::set_inactivity_timeout (this=0x2ae564011830,
timeout=300000000000) at P_UnixNetVConnection.h:324
324    P_UnixNetVConnection.h: No such file or directory.
    in P_UnixNetVConnection.h
(gdb) cont
Continuing.
Hardware watchpoint 22: *(0x2ae564011a28)

Old value = -794113713
New value = 0
UnixNetVConnection::mainEvent (this=0x2ae564011830, event=1, e=0x1f37c90) at
UnixNetVConnection.cc:1060
1060    UnixNetVConnection.cc: No such file or directory.
    in UnixNetVConnection.cc
(gdb) break 1068
Breakpoint 23 at 0x738358: file UnixNetVConnection.cc, line 1068.

Breakpoint 23, UnixNetVConnection::mainEvent (this=0x2ae575679c90, event=1,
e=0x1f34a50) at UnixNetVConnection.cc:1068
1068    in UnixNetVConnection.cc
(gdb) p closed
$24 = 0
(gdb) p read.vio
$25 = {_cont = 0x0, nbytes = 0, ndone = 0, op = 1, buffer = {mbuf = 0x0, entry
= 0x0}, vc_server = 0x2ae575679c90, mutex = {m_ptr = 0x2ae6241a3720}}

Breakpoint 24, read_signal_and_update (event=105, vc=0x2ae5758ffab0) at
UnixNetVConnection.cc:137
137    in UnixNetVConnection.cc
(gdb) p vc->read.vio
$33 = {_cont = 0x0, nbytes = 0, ndone = 0, op = 1, buffer = {mbuf = 0x0, entry
= 0x0}, vc_server = 0x2ae5758ffab0, mutex = {m_ptr = 0x2ae6100a3a20}}
(gdb) cont
Continuing.
[Switching to Thread 0x2ae55e928700 (LWP 10088)]
{code}

  was:
The patch in TS-3196 seems to result in fd leak in our prod. There are a bunch 
of hung sockets (in close-wait state), stuck forever (remain leaked for days 
after stopping the traffic). Debugging further, it seems that the inactivity 
audit is broken by this patch.

Some info below for the leaked sockets (in close_wait state):

{code}
$ ss -s ; sudo traffic_line -r
proxy.process.net.connections_currently_open; sudo traffic_line -r
proxy.process.http.current_client_connections; sudo traffic_line -r
proxy.process.http.current_server_connections; sudo ls -l /proc/$(pidof
traffic_server)/fd/ 3>/dev/null| wc -l
Total: 29367 (kernel 29437)
TCP:   78235 (estab 5064, closed 46593, orphaned 0, synrecv 0, timewait 15/0),
ports 918

Transport Total     IP        IPv6
*      29437     -         -        
RAW      0         0         0        
UDP      16        13        3        
TCP      31642     31637     5        
INET      31658     31650     8        
FRAG      0         0         0        

Password: 
27689
1
1
27939

A snippet from "lsof -p $(pidof traffic_server)"

[ET_NET 10024 nobody  240u  IPv4         2138575429        0t0        TCP 
67.195.33.62:443->66.87.139.114:29381 (CLOSE_WAIT)
[ET_NET 10024 nobody  241u  IPv4         2137093945        0t0        TCP 
67.195.33.62:443->201.122.209.180:49274 (CLOSE_WAIT)
[ET_NET 10024 nobody  243u  IPv4         2136018789        0t0        TCP 
67.195.33.62:443->173.225.111.9:38133 (CLOSE_WAIT)
[ET_NET 10024 nobody  245u  IPv4         2135996293        0t0        TCP 
67.195.33.62:443->172.243.79.180:52701 (CLOSE_WAIT)
[ET_NET 10024 nobody  248u  IPv4         2136468896        0t0        TCP 
67.195.33.62:443->173.225.111.82:42273 (CLOSE_WAIT)
[ET_NET 10024 nobody  253u  IPv4         2140213864        0t0        TCP 
67.195.33.62:443->174.138.185.120:34936 (CLOSE_WAIT)
[ET_NET 10024 nobody  259u  IPv4         2137861176        0t0        TCP 
67.195.33.62:443->76.199.250.133:60631 (CLOSE_WAIT)
[ET_NET 10024 nobody  260u  IPv4         2139081493        0t0        TCP 
67.195.33.62:443->187.74.154.214:58800 (CLOSE_WAIT)
[ET_NET 10024 nobody  261u  IPv4         2134948565        0t0        TCP 
67.195.33.62:443->23.242.49.117:4127 (CLOSE_WAIT)
[ET_NET 10024 nobody  262u  IPv4         2135708046        0t0        TCP 
67.195.33.62:443->66.241.71.243:50318 (CLOSE_WAIT)
[ET_NET 10024 nobody  263u  IPv4         2138896897        0t0        TCP 
67.195.33.62:443->73.35.151.106:52414 (CLOSE_WAIT)
[ET_NET 10024 nobody  264u  IPv4         2135589029        0t0        TCP 
67.195.33.62:443->96.251.12.27:62426 (CLOSE_WAIT)
[ET_NET 10024 nobody  265u  IPv4         2134930235        0t0        TCP 
67.195.33.62:443->207.118.3.196:50690 (CLOSE_WAIT)
[ET_NET 10024 nobody  267u  IPv4         2137837515        0t0        TCP 
67.195.33.62:443->98.112.195.98:52028 (CLOSE_WAIT)
[ET_NET 10024 nobody  269u  IPv4         2135272855        0t0        TCP 
67.195.33.62:443->24.1.230.25:57265 (CLOSE_WAIT)
[ET_NET 10024 nobody  270u  IPv4         2135820802        0t0        TCP 
67.195.33.62:443->24.75.122.66:14345 (CLOSE_WAIT)
[ET_NET 10024 nobody  271u  IPv4         2135475042        0t0        TCP 
67.195.33.62:443->65.102.35.112:49188 (CLOSE_WAIT)
[ET_NET 10024 nobody  272u  IPv4         2135328974        0t0        TCP 
67.195.33.62:443->209.242.195.252:54890 (CLOSE_WAIT)
[ET_NET 10024 nobody  273u  IPv4         2137542791        0t0        TCP 
67.195.33.62:443->76.79.183.188:47048 (CLOSE_WAIT)
[ET_NET 10024 nobody  274u  IPv4         2134806135        0t0        TCP 
67.195.33.62:443->189.251.149.36:58106 (CLOSE_WAIT)
[ET_NET 10024 nobody  275u  IPv4         2140126017        0t0        TCP 
67.195.33.62:443->68.19.173.44:1397 (CLOSE_WAIT)
[ET_NET 10024 nobody  276u  IPv4         2134636089        0t0        TCP 
67.195.33.62:443->67.44.192.72:22112 (CLOSE_WAIT)
[ET_NET 10024 nobody  278u  IPv4         2134708339        0t0        TCP 
67.195.33.62:443->107.220.216.155:51242 (CLOSE_WAIT)
[ET_NET 10024 nobody  279u  IPv4         2134580888        0t0        TCP 
67.195.33.62:443->50.126.116.209:59432 (CLOSE_WAIT)
[ET_NET 10024 nobody  281u  IPv4         2134868131        0t0        TCP 
67.195.33.62:443->108.38.255.44:4612 (CLOSE_WAIT)
[ET_NET 10024 nobody  282u  IPv4         2139275601        0t0        TCP 
67.195.33.62:443->168.99.78.5:63345 (CLOSE_WAIT)
[ET_NET 10024 nobody  283u  IPv4         2138019644        0t0        TCP 
67.195.33.62:443->209.242.204.18:47547 (CLOSE_WAIT)
[ET_NET 10024 nobody  284u  IPv4         2134873211        0t0        TCP 
67.195.33.62:443->99.47.247.0:58672 (CLOSE_WAIT)
[ET_NET 10024 nobody  285u  IPv4         2137679393        0t0        TCP 
67.195.33.62:443->174.138.184.227:43908 (CLOSE_WAIT)
[ET_NET 10024 nobody  287u  IPv4         2137417894        0t0        TCP 
67.195.33.62:443->50.46.200.8:4880 (CLOSE_WAIT)
[ET_NET 10024 nobody  288u  IPv4         2135541215        0t0        TCP 
67.195.33.62:443->216.201.190.60:41016 (CLOSE_WAIT)
[ET_NET 10024 nobody  289u  IPv4         2137315400        0t0        TCP 
67.195.33.62:443->98.180.195.225:53754 (CLOSE_WAIT)
[ET_NET 10024 nobody  291u  IPv4         2136931822        0t0        TCP 
67.195.33.62:443->12.199.206.210:7271 (CLOSE_WAIT)
[ET_NET 10024 nobody  292u  IPv4         2137892940        0t0        TCP 
67.195.33.62:443->166.173.60.100:16766 (CLOSE_WAIT)
[ET_NET 10024 nobody  293u  IPv4         2136627603        0t0        TCP 
67.195.33.62:443->71.131.5.198:51552 (CLOSE_WAIT)
[ET_NET 10024 nobody  294u  IPv4         2135059554        0t0        TCP 
67.195.33.62:443->209.95.183.181:39707 (CLOSE_WAIT)
{code}

Below are the traces showing how the inactivity audit is stuck in a loop, 
basically resetting the timer every time it expires, and not closing the vc.

{code}
[Jan 13 23:08:32.120] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
vc: 0x2ae574f13730 timeout at: 1421190525 timeout in: 300
[Jan 13 23:08:33.164] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
vc: 0x2ae574f13730 timeout at: 1421190525 timeout in: 300
.....
[Jan 13 23:13:46.833] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
[Jan 13 23:13:47.845] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose) 
vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
[Jan 13 23:08:47.639] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop) vc:
0x2ae574f13730 inactivity timeout not set, setting a default of 300
[Jan 13 23:08:48.666] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
........
{code}

Attached gdb to track who's resetting the inactivity timer. Basically, when the 
inactivity timer expires for these connections , the audit correctly generates 
a signal to the vc's continuation. However, since, the patch here reset the read
continuation to NULL, this event simply is ignored, while at the same time, the 
inactivity timer is reset. This keeps looping forever resulting in never 
releasing these inactive connections. Below's the debug output from gdb.

{code}
(gdb) cont
Continuing.
Hardware watchpoint 22: *(0x2ae564011a28)

Old value = 0
New value = -794113713
UnixNetVConnection::set_inactivity_timeout (this=0x2ae564011830,
timeout=300000000000) at P_UnixNetVConnection.h:324
324    P_UnixNetVConnection.h: No such file or directory.
    in P_UnixNetVConnection.h
(gdb) cont
Continuing.
Hardware watchpoint 22: *(0x2ae564011a28)

Old value = -794113713
New value = 0
UnixNetVConnection::mainEvent (this=0x2ae564011830, event=1, e=0x1f37c90) at
UnixNetVConnection.cc:1060
1060    UnixNetVConnection.cc: No such file or directory.
    in UnixNetVConnection.cc
(gdb) break 1068
Breakpoint 23 at 0x738358: file UnixNetVConnection.cc, line 1068.

Breakpoint 23, UnixNetVConnection::mainEvent (this=0x2ae575679c90, event=1,
e=0x1f34a50) at UnixNetVConnection.cc:1068
1068    in UnixNetVConnection.cc
(gdb) p closed
$24 = 0
(gdb) p read.vio
$25 = {_cont = 0x0, nbytes = 0, ndone = 0, op = 1, buffer = {mbuf = 0x0, entry
= 0x0}, vc_server = 0x2ae575679c90, mutex = {m_ptr = 0x2ae6241a3720}}

Breakpoint 24, read_signal_and_update (event=105, vc=0x2ae5758ffab0) at
UnixNetVConnection.cc:137
137    in UnixNetVConnection.cc
(gdb) p vc->read.vio
$33 = {_cont = 0x0, nbytes = 0, ndone = 0, op = 1, buffer = {mbuf = 0x0, entry
= 0x0}, vc_server = 0x2ae5758ffab0, mutex = {m_ptr = 0x2ae6100a3a20}}
(gdb) cont
Continuing.
[Switching to Thread 0x2ae55e928700 (LWP 10088)]
{code}


> Inactivity timeout audit broken
> -------------------------------
>
>                 Key: TS-3299
>                 URL: https://issues.apache.org/jira/browse/TS-3299
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sudheer Vinukonda
>
> The patch in TS-3196 seems to result in fd leak in our prod. There are a 
> bunch of hung sockets (in close-wait state), stuck forever (remain leaked for 
> days after stopping the traffic). Debugging further, it seems that the 
> inactivity audit is broken by this patch.
> Some info below for the leaked sockets (in close_wait state):
> {code}
> $ ss -s ; sudo traffic_line -r
> proxy.process.net.connections_currently_open; sudo traffic_line -r
> proxy.process.http.current_client_connections; sudo traffic_line -r
> proxy.process.http.current_server_connections; sudo ls -l /proc/$(pidof
> traffic_server)/fd/ 3>/dev/null| wc -l
> Total: 29367 (kernel 29437)
> TCP:   78235 (estab 5064, closed 46593, orphaned 0, synrecv 0, timewait 15/0),
> ports 918
> Transport Total     IP        IPv6
> *      29437     -         -        
> RAW      0         0         0        
> UDP      16        13        3        
> TCP      31642     31637     5        
> INET      31658     31650     8        
> FRAG      0         0         0        
> Password: 
> 27689
> 1
> 1
> 27939
> A snippet from "lsof -p $(pidof traffic_server)"
> [ET_NET 10024 nobody  240u  IPv4         2138575429        0t0        TCP 
> 67.195.33.62:443->66.87.139.114:29381 (CLOSE_WAIT)
> [ET_NET 10024 nobody  241u  IPv4         2137093945        0t0        TCP 
> 67.195.33.62:443->201.122.209.180:49274 (CLOSE_WAIT)
> [ET_NET 10024 nobody  243u  IPv4         2136018789        0t0        TCP 
> 67.195.33.62:443->173.225.111.9:38133 (CLOSE_WAIT)
> [ET_NET 10024 nobody  245u  IPv4         2135996293        0t0        TCP 
> 67.195.33.62:443->172.243.79.180:52701 (CLOSE_WAIT)
> [ET_NET 10024 nobody  248u  IPv4         2136468896        0t0        TCP 
> 67.195.33.62:443->173.225.111.82:42273 (CLOSE_WAIT)
> [ET_NET 10024 nobody  253u  IPv4         2140213864        0t0        TCP 
> 67.195.33.62:443->174.138.185.120:34936 (CLOSE_WAIT)
> [ET_NET 10024 nobody  259u  IPv4         2137861176        0t0        TCP 
> 67.195.33.62:443->76.199.250.133:60631 (CLOSE_WAIT)
> [ET_NET 10024 nobody  260u  IPv4         2139081493        0t0        TCP 
> 67.195.33.62:443->187.74.154.214:58800 (CLOSE_WAIT)
> [ET_NET 10024 nobody  261u  IPv4         2134948565        0t0        TCP 
> 67.195.33.62:443->23.242.49.117:4127 (CLOSE_WAIT)
> [ET_NET 10024 nobody  262u  IPv4         2135708046        0t0        TCP 
> 67.195.33.62:443->66.241.71.243:50318 (CLOSE_WAIT)
> [ET_NET 10024 nobody  263u  IPv4         2138896897        0t0        TCP 
> 67.195.33.62:443->73.35.151.106:52414 (CLOSE_WAIT)
> [ET_NET 10024 nobody  264u  IPv4         2135589029        0t0        TCP 
> 67.195.33.62:443->96.251.12.27:62426 (CLOSE_WAIT)
> [ET_NET 10024 nobody  265u  IPv4         2134930235        0t0        TCP 
> 67.195.33.62:443->207.118.3.196:50690 (CLOSE_WAIT)
> [ET_NET 10024 nobody  267u  IPv4         2137837515        0t0        TCP 
> 67.195.33.62:443->98.112.195.98:52028 (CLOSE_WAIT)
> [ET_NET 10024 nobody  269u  IPv4         2135272855        0t0        TCP 
> 67.195.33.62:443->24.1.230.25:57265 (CLOSE_WAIT)
> [ET_NET 10024 nobody  270u  IPv4         2135820802        0t0        TCP 
> 67.195.33.62:443->24.75.122.66:14345 (CLOSE_WAIT)
> [ET_NET 10024 nobody  271u  IPv4         2135475042        0t0        TCP 
> 67.195.33.62:443->65.102.35.112:49188 (CLOSE_WAIT)
> [ET_NET 10024 nobody  272u  IPv4         2135328974        0t0        TCP 
> 67.195.33.62:443->209.242.195.252:54890 (CLOSE_WAIT)
> [ET_NET 10024 nobody  273u  IPv4         2137542791        0t0        TCP 
> 67.195.33.62:443->76.79.183.188:47048 (CLOSE_WAIT)
> [ET_NET 10024 nobody  274u  IPv4         2134806135        0t0        TCP 
> 67.195.33.62:443->189.251.149.36:58106 (CLOSE_WAIT)
> [ET_NET 10024 nobody  275u  IPv4         2140126017        0t0        TCP 
> 67.195.33.62:443->68.19.173.44:1397 (CLOSE_WAIT)
> [ET_NET 10024 nobody  276u  IPv4         2134636089        0t0        TCP 
> 67.195.33.62:443->67.44.192.72:22112 (CLOSE_WAIT)
> [ET_NET 10024 nobody  278u  IPv4         2134708339        0t0        TCP 
> 67.195.33.62:443->107.220.216.155:51242 (CLOSE_WAIT)
> [ET_NET 10024 nobody  279u  IPv4         2134580888        0t0        TCP 
> 67.195.33.62:443->50.126.116.209:59432 (CLOSE_WAIT)
> [ET_NET 10024 nobody  281u  IPv4         2134868131        0t0        TCP 
> 67.195.33.62:443->108.38.255.44:4612 (CLOSE_WAIT)
> [ET_NET 10024 nobody  282u  IPv4         2139275601        0t0        TCP 
> 67.195.33.62:443->168.99.78.5:63345 (CLOSE_WAIT)
> [ET_NET 10024 nobody  283u  IPv4         2138019644        0t0        TCP 
> 67.195.33.62:443->209.242.204.18:47547 (CLOSE_WAIT)
> [ET_NET 10024 nobody  284u  IPv4         2134873211        0t0        TCP 
> 67.195.33.62:443->99.47.247.0:58672 (CLOSE_WAIT)
> [ET_NET 10024 nobody  285u  IPv4         2137679393        0t0        TCP 
> 67.195.33.62:443->174.138.184.227:43908 (CLOSE_WAIT)
> [ET_NET 10024 nobody  287u  IPv4         2137417894        0t0        TCP 
> 67.195.33.62:443->50.46.200.8:4880 (CLOSE_WAIT)
> [ET_NET 10024 nobody  288u  IPv4         2135541215        0t0        TCP 
> 67.195.33.62:443->216.201.190.60:41016 (CLOSE_WAIT)
> [ET_NET 10024 nobody  289u  IPv4         2137315400        0t0        TCP 
> 67.195.33.62:443->98.180.195.225:53754 (CLOSE_WAIT)
> [ET_NET 10024 nobody  291u  IPv4         2136931822        0t0        TCP 
> 67.195.33.62:443->12.199.206.210:7271 (CLOSE_WAIT)
> [ET_NET 10024 nobody  292u  IPv4         2137892940        0t0        TCP 
> 67.195.33.62:443->166.173.60.100:16766 (CLOSE_WAIT)
> [ET_NET 10024 nobody  293u  IPv4         2136627603        0t0        TCP 
> 67.195.33.62:443->71.131.5.198:51552 (CLOSE_WAIT)
> [ET_NET 10024 nobody  294u  IPv4         2135059554        0t0        TCP 
> 67.195.33.62:443->209.95.183.181:39707 (CLOSE_WAIT)
> {code}
> Below are the traces showing how the inactivity audit is stuck in a loop, 
> basically resetting the timer every time it expires, and not closing the vc.
> {code}
> [Jan 13 23:08:32.120] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
> vc: 0x2ae574f13730 timeout at: 1421190525 timeout in: 300
> [Jan 13 23:08:33.164] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
> vc: 0x2ae574f13730 timeout at: 1421190525 timeout in: 300
> .....
> [Jan 13 23:13:46.833] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose) 
> vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
> [Jan 13 23:13:47.845] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose) 
> vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
> [Jan 13 23:08:47.639] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop) vc:
> 0x2ae574f13730 inactivity timeout not set, setting a default of 300
> [Jan 13 23:08:48.666] Server {0x2ae55fc3b700} DEBUG: (inactivity_cop_verbose)
> vc: 0x2ae574f13730 timeout at: 1421190827 timeout in: 300
> ........
> {code}
> Attached gdb to track who's resetting the inactivity timer - It seems that, 
> when the inactivity timer expires for these connections, the audit correctly 
> generates a signal to the vc's continuation. However, since, the patch in 
> TS-3196 reset the read continuation to NULL, this event simply is ignored (in 
> {{read_signal_and_update()}}, while at the same time, the inactivity timer is 
> reset. This keeps looping forever resulting in never releasing these inactive 
> connections. Below's the debug output from gdb.
> {code}
> (gdb) cont
> Continuing.
> Hardware watchpoint 22: *(0x2ae564011a28)
> Old value = 0
> New value = -794113713
> UnixNetVConnection::set_inactivity_timeout (this=0x2ae564011830,
> timeout=300000000000) at P_UnixNetVConnection.h:324
> 324    P_UnixNetVConnection.h: No such file or directory.
>     in P_UnixNetVConnection.h
> (gdb) cont
> Continuing.
> Hardware watchpoint 22: *(0x2ae564011a28)
> Old value = -794113713
> New value = 0
> UnixNetVConnection::mainEvent (this=0x2ae564011830, event=1, e=0x1f37c90) at
> UnixNetVConnection.cc:1060
> 1060    UnixNetVConnection.cc: No such file or directory.
>     in UnixNetVConnection.cc
> (gdb) break 1068
> Breakpoint 23 at 0x738358: file UnixNetVConnection.cc, line 1068.
> Breakpoint 23, UnixNetVConnection::mainEvent (this=0x2ae575679c90, event=1,
> e=0x1f34a50) at UnixNetVConnection.cc:1068
> 1068    in UnixNetVConnection.cc
> (gdb) p closed
> $24 = 0
> (gdb) p read.vio
> $25 = {_cont = 0x0, nbytes = 0, ndone = 0, op = 1, buffer = {mbuf = 0x0, entry
> = 0x0}, vc_server = 0x2ae575679c90, mutex = {m_ptr = 0x2ae6241a3720}}
> Breakpoint 24, read_signal_and_update (event=105, vc=0x2ae5758ffab0) at
> UnixNetVConnection.cc:137
> 137    in UnixNetVConnection.cc
> (gdb) p vc->read.vio
> $33 = {_cont = 0x0, nbytes = 0, ndone = 0, op = 1, buffer = {mbuf = 0x0, entry
> = 0x0}, vc_server = 0x2ae5758ffab0, mutex = {m_ptr = 0x2ae6100a3a20}}
> (gdb) cont
> Continuing.
> [Switching to Thread 0x2ae55e928700 (LWP 10088)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to