[
https://issues.apache.org/jira/browse/TS-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385043#comment-15385043
]
Leif Hedstrom commented on TS-4372:
-----------------------------------
[~shinrich] [~bcall] Are you still seeing this with 6.2.0 ? We've been running
with 6.2.0 for almost a week, but there's only marginal amounts of H2 traffic.
But no issues with healthchecks failing so far.
> Traffic server heart beat fails with 6.1
> ----------------------------------------
>
> Key: TS-4372
> URL: https://issues.apache.org/jira/browse/TS-4372
> Project: Traffic Server
> Issue Type: Bug
> Components: Cop, Manager
> Reporter: Susan Hinrichs
> Assignee: Susan Hinrichs
> Fix For: 7.0.0
>
> Attachments: ts-4372-example.pcap
>
>
> When running 6.1 in a loaded production environment, traffic server will run
> for a while (30 minutes or so), then server heart beats will start failing
> intermittently. Eventually two will fail in a row causing the traffic_cop to
> restart traffic_server (or traffic_manager and then traffic_server I'm still
> a bit unclear there).
> {code}
> traffic_cop[18078]: (test) read failed [104 'Connection reset by peer']
> {code}
> There are no particular resource limitations on the production machine in
> this state. The number of open sockets is around 50-60K which is consistent
> with its 5.3.x peer. The memory usage is no where near the limit. The CPU
> usage is high, but again, not near the limit (perhaps half the entire machine
> usage).
> If we look at the packets exchanged on the loopback interface during this
> heartbeat failing interval, we see some interesting things. I'll attach an
> example pcap file. The interesting traffic is on port 8084 and 8083.
> Traffic_cop sends a GET http://127.0.0.1:8083/synthetic.txt request to
> traffic_server over port 8084. Traffic server should proxy the request and
> send the request GET /synthetic.txt to traffic_manager listing on port 8083.
> Traffic manager returns a 200 response with some data. Traffic_server relays
> that response to traffic_cop.
> However, in the failure cases, traffic_cop sends the request and
> traffic_manager sends a RESET after the connection has been established and
> the request has been sent to it. I'm guessing that there is logic in
> traffic_server that closes the socket before reading the get request causing
> the reset to be sent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)