[jira] Commented: (TS-394) taffic_server process sig abort in full cluster mode

Zhao Yongming (JIRA) Fri, 13 Aug 2010 06:43:42 -0700

    [ 
https://issues.apache.org/jira/browse/TS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898235#action_12898235
 ]


Zhao Yongming commented on TS-394:
----------------------------------

retested the recently change,  to be surprise that trunk have a working cluster 
code. and here is the change that fix the breaking:

commit fef40a0c5e4d0362c759f37d19dc47208798e38c
Author: zwoop <zw...@13f79535-47bb-0310-9956-ffa450edef68>
Date:   Wed Jun 16 23:00:40 2010 +0000

    TS-320: Do some cleanup on Connection::fast_connect and 
Connection::bind_connect
    
    Tested: FC-13 64-bit
    Author: Alan M. Carroll
    Review and comments: John Plevyak
    
    git-svn-id: 
https://svn.apache.org/repos/asf/trafficserver/traffic/tr...@955421 
13f79535-47bb-0310-9956-ffa450edef68

so far, thanks all :D

> taffic_server process sig abort in full cluster mode
> ----------------------------------------------------
>
>                 Key: TS-394
>                 URL: https://issues.apache.org/jira/browse/TS-394
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>         Environment: ATS in full cluster mode is unusable, the traffic_server 
> process will get sig abort by every request. code in trunk tested. 
>            Reporter: Zhao Yongming
>            Priority: Critical
>             Fix For: 2.3.0
>
>         Attachments: traffic_full_cluster_sig_abort.patch
>
>
> I am trying to setup full cluster mode, but geting connection abort during 
> every request. after tcpdump, it seems that ATS got the correct source file 
> from backend, but do not send out the full file( with tcp reset during http 
> transfer to client), then i am trying to figure out the root cause.
> with debug log enabled in records.config:
> CONFIG proxy.config.diags.debug.enabled INT 1
> CONFIG proxy.config.diags.debug.tags STRING http.*|cluster.*
> I got the following log from traffic.out:
> [Jun 22 15:19:11.296] Manager {139809315796768} ERROR: 
> [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 6: 
> Aborted
> [Jun 22 15:19:11.296] Manager {139809315796768} ERROR:  (last system error 2: 
> No such file or directory)
> [Jun 22 15:19:11.296] Manager {139809315796768} ERROR: [Alarms::signalAlarm] 
> Server Process was reset
> [Jun 22 15:19:11.296] Manager {139809315796768} ERROR:  (last system error 2: 
> No such file or directory)
> after strace traffic_server, I got the following info:
> [pid 19830]      0.000306 <... epoll_wait resumed> {}, 32768, 10) = 0
> [pid 19830]      0.000031 gettimeofday({1277256740, 532763}, NULL) = 0
> [pid 19830]      0.000181 write(2, "FATAL: ClusterHandler.cc:2047: failed 
> assert `ntodo >= 0`\n", 58) = 58
> [pid 19830]      0.000076 gettimeofday({1277256740, 533020}, NULL) = 0
> [pid 19830]      0.000071 socket(PF_FILE, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 101
> [pid 19830]      0.000062 connect(101, {sa_family=AF_FILE, path="/dev/log"}, 
> 110) = -1 ENOENT (No such file or directory)
> [pid 19830]      0.000074 close(101)    = 0
> [pid 19830]      0.000056 write(2, "/usr/bin/traffic_server", 23) = 23
> [pid 19830]      0.000050 write(2, " - STACK TRACE: \n", 17) = 17
> [pid 19830]      0.000289 futex(0x2b3a08e6a5b0, FUTEX_WAKE_PRIVATE, 
> 2147483647) = 0
> [pid 19830]      0.000251 futex(0x2b3a08b11190, FUTEX_WAKE_PRIVATE, 
> 2147483647) = 0
> [pid 19830]      0.000656 writev(2, [{"/usr/bin/traffic_server", 23}, {"(", 
> 1}, {"ink_fatal_va", 12}, {"+0x", 3}, {"ab", 2}, {")", 1}, {"[0x", 3}, 
> {"6dcdab", 6}, {"]\n", 2}], 9) = 53
> I have fix the bug by comment out ClusterHandler.cc:2047. patch will followed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (TS-394) taffic_server process sig abort in full cluster mode

Reply via email to