[
https://issues.apache.org/jira/browse/TS-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208725#comment-14208725
]
Peter Walsh commented on TS-2573:
---------------------------------
Hi Leif,
We determined the problem was attributed to spikes in disk IO that put ATS in a
bad state that resulted in the timeouts I described above. We've taken steps
to work around the problem and have not experienced it since. It is possible
it can still occur but at this time it is not a concern.
As for the clustering, we are still using it heavily. Do you know why others
have stopped using clustering or if there is an alternate approach they are
using to have a distributed cache?
> Exponentional increasing of cluster timeouts
> --------------------------------------------
>
> Key: TS-2573
> URL: https://issues.apache.org/jira/browse/TS-2573
> Project: Traffic Server
> Issue Type: Bug
> Components: Clustering
> Reporter: Peter Walsh
> Fix For: sometime
>
>
> Occasionally we see cluster operations will start timing out after 5 seconds.
> This will continue at an increasing rate until traffic server is restarted.
> The following stats increase when this happens,
> proxy.process.cluster.remote_op_timeouts and
> proxy.process.cluster.connections_open.
> I can tell that spikes in IO wait can contribute to this issue.
> Any ideas?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)