[
https://issues.apache.org/jira/browse/TS-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553165#comment-13553165
]
Dow Buzzell commented on TS-1647:
---------------------------------
We have set up a private network for cache communications and so far this
appears to have fixed our issues..
> VMWARE cannot read from cache node marked as down
> -------------------------------------------------
>
> Key: TS-1647
> URL: https://issues.apache.org/jira/browse/TS-1647
> Project: Traffic Server
> Issue Type: Bug
> Components: Cache, Clustering
> Reporter: Dow Buzzell
> Attachments: 909smokinggun.png, stackbug.png
>
>
> We are seeing issues related to cluster management on VMWARE. Seems like we
> have a NIC going to sleep and losing packets during a read operation from a
> member of the cluster, then seeing this as marked down by ATS, and stays down
> until the entire cluster is restarted.. seems like idle times are at the
> heart of the issue and allocating 500MHZ to each VM or pinning the CPU
> doesn't help the NIC still sleeps .. TCPDUMP see's missing segments yet
> reports 0 packets lost. dmesg see's PC: bad TCP reclen 0x73746174
> (non-terminal)
> RPC: bad TCP reclen 0x63480000 (large)
> RPC: bad TCP reclen 0x633f0000 (large) under load, and ATS reports cannot
> read from a cluster node and marks the node down.
> WE have another datacenter where this does not happen
> the difference in kernel revisions are:
> Sleeping NIC Data Center RHEL 5
> 2.6.18-308.16.1.el5
>
> Working Data Center
> 2.6.18-194.3.1.el5
> WE have validated that VMWARE is running properly in this datacenter but are
> trying to get a ticket open with them to look into why one configuration
> works but another does not
> Everything else in the two configurations are nearly identical we are going
> to try and get the nic drivers updated as you can see it is the LATER Linux
> Kernel version that is causing headaches..
> Any ideas would really be appreciated ...
> Thank you,
> Dow
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira