Re: Ec2 Network I/O
Also once you've got your phi_convict_threshold sorted, if you see these again check: http://status.aws.amazon.com/ AWS does occasionally have the odd increased latency issue / outage. Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 19/05/2014, at 1:15 PM, Nate McCall n...@thelastpickle.com wrote: It's a good idea to increase phi_convict_threshold to at least 12 on EC2. Using placement groups and single-tenant systems will certainly help. Another optimization would be dedicating an Enhanced Network Interface (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) specifically for gossip traffic. On Mon, May 19, 2014 at 1:36 PM, Phil Burress philburress...@gmail.com wrote: Has anyone experienced network i/o issues with ec2? We are seeing a lot of these in our logs: HintedHandOffManager.java (line 477) Timed out replaying hints to /10.0.x.xxx; aborting (15 delivered) and these... Cannot handshake version with /10.0.x.xxx and these... java.io.IOException: Cannot proceed on repair because a neighbor (/10.0.x.xxx) is dead: session failed Occurs on all of our nodes. Even though in all cases, the host that is being reported as down or unavailable is up and readily 'pingable'. We are using shared tenancy on all our nodes (instance type m1.xlarge) with cassandra 2.0.7. Any suggestions on how to debug these errors? Is there a recommendation to move to Placement Groups for Cassandra? Thanks! Phil -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Ec2 Network I/O
Has anyone experienced network i/o issues with ec2? We are seeing a lot of these in our logs: HintedHandOffManager.java (line 477) Timed out replaying hints to /10.0.x.xxx; aborting (15 delivered) and these... Cannot handshake version with /10.0.x.xxx and these... java.io.IOException: Cannot proceed on repair because a neighbor (/10.0.x.xxx) is dead: session failed Occurs on all of our nodes. Even though in all cases, the host that is being reported as down or unavailable is up and readily 'pingable'. We are using shared tenancy on all our nodes (instance type m1.xlarge) with cassandra 2.0.7. Any suggestions on how to debug these errors? Is there a recommendation to move to Placement Groups for Cassandra? Thanks! Phil
Re: Ec2 Network I/O
It's a good idea to increase phi_convict_threshold to at least 12 on EC2. Using placement groups and single-tenant systems will certainly help. Another optimization would be dedicating an Enhanced Network Interface ( http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) specifically for gossip traffic. On Mon, May 19, 2014 at 1:36 PM, Phil Burress philburress...@gmail.comwrote: Has anyone experienced network i/o issues with ec2? We are seeing a lot of these in our logs: HintedHandOffManager.java (line 477) Timed out replaying hints to /10.0.x.xxx; aborting (15 delivered) and these... Cannot handshake version with /10.0.x.xxx and these... java.io.IOException: Cannot proceed on repair because a neighbor (/10.0.x.xxx) is dead: session failed Occurs on all of our nodes. Even though in all cases, the host that is being reported as down or unavailable is up and readily 'pingable'. We are using shared tenancy on all our nodes (instance type m1.xlarge) with cassandra 2.0.7. Any suggestions on how to debug these errors? Is there a recommendation to move to Placement Groups for Cassandra? Thanks! Phil -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com