[ 
https://issues.apache.org/jira/browse/TS-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dow Buzzell closed TS-1647.
---------------------------

    Backport to Version: 3.2.0

While we were able to run without this separate nic for cache on physicals and 
in one datacenter on vm's seems in the other datacenter where there are network 
issues, this is essential as when there is a dropped packet traffic server 
shuts down the cluster communication and refuses to reconnect even though it is 
seen that the other members are trying to reconnect to the member who had the 
dropped packet.. seems like this is a bug, as who really has a network that 
never drops a packet?
                
> VMWARE cannot read from cache node marked as down
> -------------------------------------------------
>
>                 Key: TS-1647
>                 URL: https://issues.apache.org/jira/browse/TS-1647
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache, Clustering
>            Reporter: Dow Buzzell
>         Attachments: 909smokinggun.png, stackbug.png
>
>
> We are seeing issues related to cluster management on VMWARE.  Seems like we 
> have a NIC going to sleep and losing packets during a read operation from a 
> member of the cluster, then seeing this as marked down by ATS, and stays down 
> until the entire cluster is restarted.. seems like idle times are at the 
> heart of the issue and allocating 500MHZ to each VM or pinning the CPU 
> doesn't help the NIC still sleeps .. TCPDUMP see's missing segments yet 
> reports 0 packets lost.  dmesg see's PC: bad TCP reclen 0x73746174 
> (non-terminal)
> RPC: bad TCP reclen 0x63480000 (large)
> RPC: bad TCP reclen 0x633f0000 (large) under load, and ATS reports cannot 
> read from a cluster node and marks the node down.
> WE have another datacenter where this does not happen 
> the difference in kernel revisions are:
> Sleeping NIC Data Center RHEL 5
> 2.6.18-308.16.1.el5
>   
> Working Data Center
> 2.6.18-194.3.1.el5
> WE have validated that VMWARE is running properly in this datacenter but are 
> trying to get a ticket open with them to look into why one configuration 
> works but another does not 
> Everything else in the two configurations are nearly identical we are going 
> to try and get the nic drivers updated as you can see it is the LATER Linux 
> Kernel version that is causing headaches.. 
> Any ideas would really be appreciated ...
> Thank you,
> Dow
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to