I've heard enough stories of firewall issues that I'm willing to bet it's the problem, if it's sitting between the nodes. On Sun, Jan 15, 2017 at 9:32 AM Anshu Vajpayee <anshu.vajpa...@gmail.com> wrote:
> ​Setup is not on cloud. We have few nodes in one DC(1) and same number > of nodes in other DC(2). We have dedicated firewall in-front on nodes. > > Read and write happen with local quorum so those dont get affected but > hints get accumulated from one DC to other DC for replications. Hints are > also getting timed out sporadically in logs. > > describe cluster didn't show any error , but in some case it was taking > longer time. > > On Sun, Jan 15, 2017 at 3:01 AM, Aleksandr Ivanov <ale...@gmail.com> > wrote: > > Could you share a bit your cluster setup? Do you use cloud for your > deployment or dedicated firewalls in front of nodes? > > If gossip shows that everything is up it doesn't mean that all nodes can > communicate with each other. I have noticed situations when TCP connection > was killed by firewall and Cassandra didn't reconnect automatically. It can > be easily detected with nodetool describecluster command. > > Aleksandr > > shows - all nodes are up. > > But when we perform writes , coordinator stores the hints. It means - > coordinator was not able to deliver the writes to few nodes after meeting > consistency requirements. > > The nodes for which writes were failing, are in different DC. Those nodes > do not have any load. > > Gossips shows everything is up. I already set write timeout to 60 sec, > but no help. > > Can anyone encounter this scenario ? Network side everything is fine. > > Cassandra version is 2.1.13 > > -- > *Regards,* > *Anshu * > > > > > > -- > *Regards,* > *Anshu * > > >