Dow Buzzell created TS-1647:
-------------------------------

             Summary: VMWARE cannot read from cache node marked as down
                 Key: TS-1647
                 URL: https://issues.apache.org/jira/browse/TS-1647
             Project: Traffic Server
          Issue Type: Bug
          Components: Cache, Clustering
            Reporter: Dow Buzzell


We are seeing issues related to cluster management on VMWARE.  Seems like we 
have a NIC going to sleep and losing packets during a read operation from a 
member of the cluster, then seeing this as marked down by ATS, and stays down 
until the entire cluster is restarted.. seems like idle times are at the heart 
of the issue and allocating 500MHZ to each VM or pinning the CPU doesn't help 
the NIC still sleeps .. TCPDUMP see's missing segments yet reports 0 packets 
lost.  dmesg see's PC: bad TCP reclen 0x73746174 (non-terminal)
RPC: bad TCP reclen 0x63480000 (large)
RPC: bad TCP reclen 0x633f0000 (large) under load, and ATS reports cannot read 
from a cluster node and marks the node down.

WE have another datacenter where this does not happen 

the difference in kernel revisions are:

Sleeping NIC Data Center RHEL 5
2.6.18-308.16.1.el5
  
Working Data Center
2.6.18-194.3.1.el5

WE have validated that VMWARE is running properly in this datacenter but are 
trying to get a ticket open with them to look into why one configuration works 
but another does not 

Everything else in the two configurations are nearly identical we are going to 
try and get the nic drivers updated as you can see it is the LATER Linux Kernel 
version that is causing headaches.. 

Any ideas would really be appreciated ...

Thank you,

Dow
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to