[ 
https://issues.apache.org/jira/browse/TS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471799#comment-15471799
 ] 

Hung-Yi Chen commented on TS-4816:
----------------------------------

I've got the same issue on FreeBSD 10.2 & 10.3.

Here's traffic.out
{noformat}
traffic_server: Segmentation fault
traffic_server - STACK TRACE:
0x4b1e19 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
/usr/local/bin/traffic_server
0x802953997 <pthread_sigmask+0x497> at /lib/libthr.so.3
0x8029531a8 <pthread_getspecific+0xdd8> at /lib/libthr.so.3
{noformat}

manager.log:
{noformat}
[Sep  8 03:46:22.180] Manager {0x804006400} ERROR: 
[LocalManager::sendMgmtMsgToProcesses] Error writing message
[Sep  8 03:46:22.181] Manager {0x804006400} ERROR: <MgmtUtils.cc:289 
(mgmt_elog)>  (last system error 32: Broken pipe)
[Sep  8 03:46:22.702] Manager {0x804006400} ERROR: 
[LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 11: 
Segmentation fault
[Sep  8 03:46:22.703] Manager {0x804006400} ERROR: [Alarms::signalAlarm] Server 
Process was reset
[Sep  8 03:46:23.771] Manager {0x804006400} WARNING: <ClusterCom.cc:1331 
(sendSharedData)> multicast send timeout exceeded.  21368 seconds since last 
send.
[Sep  8 03:46:23.786] Manager {0x804006400} NOTE: [LocalManager::startProxy] 
Launching ts process
[Sep  8 03:46:23.921] Manager {0x804006400} NOTE: 
[LocalManager::pollMgmtProcessServer] New process connecting fd '16'
[Sep  8 03:46:23.922] Manager {0x804006400} NOTE: [Alarms::signalAlarm] Server 
Process born
{noformat}


And I use truss to try to catch it.
{noformat}
clock_gettime(0,{1473277582.156076393 })         = 0 (0x0)
_umtx_op(0x800a492d8,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffde0dfe00) ERR#60 
'Operation timed out'
_umtx_op(0x800a49278,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffde4e3e00) ERR#60 
'Operation timed out'
kevent(83,{0x109d,EVFILT_WRITE,EV_ADD|EV_CLEAR,0,0x0,0x813fa41b0},1,0x0,0,0x0) 
= 0 (0x0)
kevent(66,{0xa66,EVFILT_WRITE,EV_ADD|EV_CLEAR,0,0x0,0x813d00410},1,0x0,0,0x0) = 
0 (0x0)
clock_gettime(0,{1473277582.158822671 })         = 0 (0x0)
_umtx_op(0x8050ff880,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x800a49308,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffddedde00) ERR#60 
'Operation timed out'
clock_gettime(0,{1473277582.161559171 })         = 0 (0x0)
sigprocmask(SIG_SETMASK,SIGSEGV,0x0)             = 0 (0x0)
write(1305,"HTTP/1.1 404 Not Found\r\nDate: "...,106) = 106 (0x6a)
_umtx_op(0x8050ff880,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
kevent(85,{0xe32,EVFILT_WRITE,EV_ADD|EV_CLEAR,0,0x0,0x813f10270},1,0x0,0,0x0) = 
0 (0x0)
writev(0x10ad,0x7fffdf2f14d0,0x2,0x1a1,0x813fa1040,0x7fffdf2f1674) = 417 (0x1a1)
kevent(66,{0xdf0,EVFILT_WRITE,EV_ADD|EV_CLEAR,0,0x0,0x813eec030},1,0x0,0,0x0) = 
0 (0x0)
clock_gettime(0,{1473277582.166437260 })         = 0 (0x0)
clock_gettime(0,{1473277582.164370953 })         = 0 (0x0)
clock_gettime(0,{1473277582.167189089 })         = 0 (0x0)
_umtx_op(0x8050ff880,UMTX_OP_MUTEX_WAKE2,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8050ff880,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
clock_gettime(0,{1473277582.168218088 })         = 0 (0x0)
clock_gettime(13,{1473277582.000000000 })        = 0 (0x0)
_umtx_op(0x800a492d8,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffde0dfe00) ERR#60 
'Operation timed out'
kevent(66,{0x61f,EVFILT_WRITE,EV_ADD|EV_CLEAR,0,0x0,0x80d826490},1,0x0,0,0x0) = 
0 (0x0)
writev(0xe62,0x7fffde9e84d0,0x2,0x18d,0x813f077e0,0x7fffde9e8674) ERR#32 
'Broken pipe'
clock_gettime(0,{1473277582.172662091 })         = 0 (0x0)
read(3358,0x8a6b0305b,4005)                      ERR#54 'Connection reset by 
peer'
close(1757)                                      = 0 (0x0)
close(986)                                       = 0 (0x0)
clock_gettime(0,{1473277582.174860873 })         = 0 (0x0)
close(3358)                                      = 0 (0x0)
_umtx_op(0x800a49278,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffde4e3e00) ERR#60 
'Operation timed out'
read(4264,0x8a1a5b05b,4005)                      ERR#54 'Connection reset by 
peer'
write(2360,"HTTP/1.1 404 Not Found\r\nDate: "...,106) = 106 (0x6a)
sigreturn(0x7fffddcd92d0,0x7fffddcd92d0,0x300,0x0,0xfffffffffffffbc0,0x8080808080808080)
 = 34477128384 (0x806ff3ac0)
SIGNAL 11 (SIGSEGV)
process exit, rval = 0
{noformat}

It seems some kind of issue of threading?

> ATS 6.2.0 - crashing with broken pipe, sig11 segmentation fault
> ---------------------------------------------------------------
>
>                 Key: TS-4816
>                 URL: https://issues.apache.org/jira/browse/TS-4816
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Manager
>            Reporter: David Brodin
>              Labels: crash
>             Fix For: 7.0.0
>
>         Attachments: ats_gdb-160905.txt, ats_manager-gdb-160906.txt, 
> crash-2016-09-05-075309.log, est_socks-ats_6.2.0.png
>
>
> Hi,
> We just upgraded to ATS 6.2.0 via FreeBSD ports:
> {noformat}
> [root@<machine> ~]# uname -a
> FreeBSD <machine> 10.3-RELEASE-p7 FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 
> 18:38:15 UTC 2016     
> [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64
> [root@<machine> ~]# pkg info | grep traff
> trafficserver-6.2.0            Fast, scalable and extensible HTTP proxy server
> {noformat}
> We are experiencing crashes, usually during the day, hardly any during "low" 
> loads, but the mest affecting crashes occur in the early mornings. Along with 
> this we can see a memory leak aswell
> We are using ATS as an enterprise proxy to the Internet, and as we have a 
> very good Internet-connection we have also disabled caching.
> I'm not sure how I would attach files so here goes :)
> manager.log
> {noformat}
> [Sep  2 11:48:28.017] Manager {0x804006400} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> [Sep  2 11:48:28.017] Manager {0x804006400} ERROR: <MgmtUtils.cc:289 
> (mgmt_elog)>  (last system error 32: Broken pipe)
> [Sep  2 11:48:38.305] {0x804006400} STATUS: opened 
> /var/log/trafficserver/manager.log
> [Sep  2 11:48:38.305] {0x804006400} NOTE: <DiagsConfig.cc:141 
> (reconfigure_diags)> updated diags config
> [Sep  2 11:48:38.311] Manager {0x804006400} NOTE: [ClusterCom::ClusterCom] 
> Node running on OS: 'FreeBSD' Release: '10.3-RELEASE-p7'
> [Sep  2 11:48:38.312] Manager {0x804006400} NOTE: 
> [LocalManager::listenForProxy] Listening on port: 8080 (IPv4)
> [Sep  2 11:48:38.313] Manager {0x804006400} NOTE: 
> [LocalManager::listenForProxy] Listening on port: 8080 (IPv6)
> [Sep  2 11:48:38.313] Manager {0x804006400} NOTE: [TrafficManager] Setup 
> complete
> [Sep  2 11:48:39.321] Manager {0x804006400} NOTE: [LocalManager::startProxy] 
> Launching ts process
> [Sep  2 11:48:39.336] Manager {0x804006400} NOTE: 
> [LocalManager::pollMgmtProcessServer] New process connecting fd '17'
> [Sep  2 11:48:39.336] Manager {0x804006400} NOTE: [Alarms::signalAlarm] 
> Server Process born
> [Sep  2 11:51:32.574] Manager {0x804006400} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> [Sep  2 11:51:32.574] Manager {0x804006400} ERROR: <MgmtUtils.cc:289 
> (mgmt_elog)>  (last system error 32: Broken pipe)
> [Sep  2 11:51:32.669] Manager {0x804006400} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 
> 11: Segmentation fault
> [Sep  2 11:51:32.669] Manager {0x804006400} ERROR: [Alarms::signalAlarm] 
> Server Process was reset
> [Sep  2 11:51:33.674] Manager {0x804006400} NOTE: [LocalManager::startProxy] 
> Launching ts process
> [Sep  2 11:51:33.689] Manager {0x804006400} NOTE: 
> [LocalManager::pollMgmtProcessServer] New process connecting fd '13'
> [Sep  2 11:51:33.690] Manager {0x804006400} NOTE: [Alarms::signalAlarm] 
> Server Process born
> [Sep  3 04:14:35.380] Manager {0x804006400} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> [Sep  3 04:14:35.380] Manager {0x804006400} ERROR: <MgmtUtils.cc:289 
> (mgmt_elog)>  (last system error 32: Broken pipe)
> [Sep  3 04:14:35.748] Manager {0x804006400} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 
> 11: Segmentation fault
> [Sep  3 04:14:35.748] Manager {0x804006400} ERROR: [Alarms::signalAlarm] 
> Server Process was reset
> [Sep  3 04:14:36.814] Manager {0x804006400} NOTE: [LocalManager::startProxy] 
> Launching ts process
> [Sep  3 04:14:36.828] Manager {0x804006400} NOTE: 
> [LocalManager::pollMgmtProcessServer] New process connecting fd '13'
> [Sep  3 04:14:36.829] Manager {0x804006400} NOTE: [Alarms::signalAlarm] 
> Server Process born
> {noformat}
> traffic.out - since this isnt timestamped I'm not sure if I'm leaving some of 
> the stacktrace out:
> {noformat}
> traffic_server[TrafficManager] ==> Cleaning up and reissuing signal #15
> : Terminated
> traffic_server: Terminatedtraffic_servertraffic_servertraffic_server: 
> Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> getpeereid -> 0 (54, Connection reset by peer)[TrafficManager] ==> Cleaning 
> up and reissuing signal #15
> traffic_server: Terminated
> traffic_server: Terminated
> traffic_server: using root directory '/usr/local'
> [TrafficManager] ==> signal #15
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> traffic_server: Terminated
> traffic_server: Terminated
> traffic_server: Terminated
> traffic_server: Terminated
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> traffic_server: Segmentation fault
> traffic_server - STACK TRACE:
> 0x4af409 <_Z19crash_logger_invokeiP9__siginfoPv+0x69> at 
> /usr/local/bin/traffic_server
> 0x802735b37 <pthread_sigmask+0x507> at /lib/libthr.so.3
> 0x80273522c <pthread_getspecific+0xe1c> at /lib/libthr.so.3
> {noformat}
> /var/log/messages
> {noformat}
> Sep  2 06:58:03 <machine> traffic_manager[5604]: {0x804006400} ERROR: 
> [LocalManager::sendMgmtMsgToProcesses] Error writing message
> Sep  2 06:58:03 <machine> kernel: pid 6680 (traffic_server), uid 80: exited 
> on signal 11
> Sep  2 06:58:03 <machine> traffic_manager[5604]: {0x804006400} ERROR: 
> <MgmtUtils.cc:289 (mgmt_elog)>  (last system error 32: Broken pipe)
> Sep  2 06:58:04 <machine> traffic_cop[5603]: cannot find traffic_server [1]
> Sep  2 06:58:04 <machine> traffic_manager[5604]: {0x804006400} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 
> 11: Segmentation fault
> Sep  2 06:58:04 <machine> traffic_manager[5604]: {0x804006400} ERROR: 
> [Alarms::signalAlarm] Server Process was reset
> Sep  2 06:58:08 <machine> traffic_server[9951]: NOTE: --- traffic_server 
> Starting ---
> Sep  2 06:58:08 <machine> traffic_server[9951]: NOTE: traffic_server Version: 
> Apache Traffic Server - traffic_server - 6.2.0 - (build # 083112 on Aug 31 
> 2016 at 12:51:58)
> Sep  2 06:58:08 <machine> traffic_server[9951]: NOTE: 
> RLIMIT_NOFILE(8):cur(190297),max(190297)
> {noformat}
> Every other second /var/log/messages is also getting 1-10 lines of this:
> {noformat}
> Sep  3 17:48:50 <machine> traffic_server[14338]: {0x804008000} ERROR: 
> <HttpSM.cc:1159 (state_raw_http_server_open)> 
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
> server_entry: 0x0
> {noformat}
> And a "ps aux" showing mem usage:
> {noformat}
> [root@<machine> /usr/local/etc/trafficserver]# ps axu | grep "USER\|traff"
> USER      PID  %CPU %MEM     VSZ     RSS TT  STAT STARTED        TIME COMMAND
> www     14338   7.9 21.3 1910932 1778200  -  S     4:14AM    27:51.59 
> /usr/local/bin/traffic_server -M --bind_stdout 
> /var/log/trafficserver/traffic.out --bind_stderr /var/log/traff
> root     5602   0.0  0.0   14492    2004  -  Is   Thu10AM     0:00.00 daemon: 
> /usr/local/bin/traffic_cop[5603] (daemon)
> root     5603   0.0  0.1   64360    7516  -  Ss   Thu10AM     0:07.96 
> /usr/local/bin/traffic_cop
> www     10897   0.0  0.2   87544   13492  -  S    Fri11AM     0:34.78 
> /usr/local/bin/traffic_manager --bind_stdout 
> /var/log/trafficserver/traffic.out --bind_stderr /var/log/traffic
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to