Re: Ec2 Network I/O

2014-05-20 Thread Ben Bromhead
Also once you've got your phi_convict_threshold sorted, if you see these again 
check:

http://status.aws.amazon.com/ 

AWS does occasionally have the odd increased latency issue / outage. 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 19/05/2014, at 1:15 PM, Nate McCall n...@thelastpickle.com wrote:

 It's a good idea to increase phi_convict_threshold to at least 12 on EC2. 
 Using placement groups and single-tenant systems will certainly help.
 
 Another optimization would be dedicating an Enhanced Network Interface 
 (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) 
 specifically for gossip traffic. 
 
 
 On Mon, May 19, 2014 at 1:36 PM, Phil Burress philburress...@gmail.com 
 wrote:
 Has anyone experienced network i/o issues with ec2? We are seeing a lot of 
 these in our logs:
 
 HintedHandOffManager.java (line 477) Timed out replaying hints to 
 /10.0.x.xxx; aborting (15 delivered)
 
 and these...
 
 Cannot handshake version with /10.0.x.xxx
 
 and these...
 
 java.io.IOException: Cannot proceed on repair because a neighbor 
 (/10.0.x.xxx) is dead: session failed
 
 Occurs on all of our nodes. Even though in all cases, the host that is being 
 reported as down or unavailable is up and readily 'pingable'.
 
 We are using shared tenancy on all our nodes (instance type m1.xlarge) with 
 cassandra 2.0.7. Any suggestions on how to debug these errors?
 
 Is there a recommendation to move to Placement Groups for Cassandra?
 
 Thanks!
 
 Phil 
 
 
 
 -- 
 -
 Nate McCall
 Austin, TX
 @zznate
 
 Co-Founder  Sr. Technical Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com



Ec2 Network I/O

2014-05-19 Thread Phil Burress
Has anyone experienced network i/o issues with ec2? We are seeing a lot of
these in our logs:

HintedHandOffManager.java (line 477) Timed out replaying hints to
/10.0.x.xxx; aborting (15 delivered)

and these...

Cannot handshake version with /10.0.x.xxx

and these...

java.io.IOException: Cannot proceed on repair because a neighbor
(/10.0.x.xxx) is dead: session failed

Occurs on all of our nodes. Even though in all cases, the host that is
being reported as down or unavailable is up and readily 'pingable'.

We are using shared tenancy on all our nodes (instance type m1.xlarge) with
cassandra 2.0.7. Any suggestions on how to debug these errors?

Is there a recommendation to move to Placement Groups for Cassandra?

Thanks!

Phil


Re: Ec2 Network I/O

2014-05-19 Thread Nate McCall
It's a good idea to increase phi_convict_threshold to at least 12 on EC2.
Using placement groups and single-tenant systems will certainly help.

Another optimization would be dedicating an Enhanced Network Interface (
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html)
specifically for gossip traffic.


On Mon, May 19, 2014 at 1:36 PM, Phil Burress philburress...@gmail.comwrote:

 Has anyone experienced network i/o issues with ec2? We are seeing a lot of
 these in our logs:

 HintedHandOffManager.java (line 477) Timed out replaying hints to
 /10.0.x.xxx; aborting (15 delivered)

 and these...

 Cannot handshake version with /10.0.x.xxx

 and these...

 java.io.IOException: Cannot proceed on repair because a neighbor
 (/10.0.x.xxx) is dead: session failed

 Occurs on all of our nodes. Even though in all cases, the host that is
 being reported as down or unavailable is up and readily 'pingable'.

 We are using shared tenancy on all our nodes (instance type m1.xlarge)
 with cassandra 2.0.7. Any suggestions on how to debug these errors?

 Is there a recommendation to move to Placement Groups for Cassandra?

 Thanks!

 Phil




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com