Re: frequent node UP/Down?

2011-09-27 Thread Yang
found the reason. the IncomingTCPConnection.run() hit an exception and the thread terminated. the next incarnation of the thread did not come up until 20 seconds later, which caused the TimedOutException and UNavalableException to clients. WARN [Thread-28] 2011-09-28 02:17:57,561

Re: frequent node UP/Down?

2011-09-27 Thread Jonathan Ellis
ITC threads are started as soon as the other party connects. On Tue, Sep 27, 2011 at 9:22 PM, Yang tedd...@gmail.com wrote: found the reason. the IncomingTCPConnection.run() hit an exception and the thread terminated. the next incarnation of the thread did not come up until 20 seconds

Re: frequent node UP/Down?

2011-09-27 Thread Yang
it looks the new conns are created by the sending side (OutboundTCPconnection.java), when it detects a IOException on write(), since these timeouts happen rather frequently, about 10 -- 20 times per hour, I wonder really it's due to network in EC2, and really would like some ways to ascertain

Re: frequent node UP/Down?

2011-09-25 Thread Philippe
I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. 2011/9/24 Yang tedd...@gmail.com I'm using 1.0.0 there seems to be too many node Up/Dead events detected by the failure detector. I'm using a 2

Re: frequent node UP/Down?

2011-09-25 Thread Radim Kolar
Dne 25.9.2011 9:29, Philippe napsal(a): I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. i have this problem too. Node down detection must be improved - increased timeouts a bit or make more tries

Re: frequent node UP/Down?

2011-09-25 Thread Radim Kolar
Dne 25.9.2011 14:31, Radim Kolar napsal(a): Dne 25.9.2011 9:29, Philippe napsal(a): I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. i have this problem too. Node down detection must be improved -

Re: frequent node UP/Down?

2011-09-25 Thread Brandon Williams
On Sat, Sep 24, 2011 at 4:54 PM, Yang tedd...@gmail.com wrote: I'm using 1.0.0 there seems to be too many node Up/Dead events detected by the failure detector. I'm using  a 2 node cluster on EC2, in the same region, same security group, so I assume the message drop rate should be

Re: frequent node UP/Down?

2011-09-25 Thread Yang
Thanks Brandon. I suspected that, but I think that's precluded as a possibility since I setup another background job to do echo | nc other_box 7000 in a loop, this job seems to be working fine all the time, so network seems fine. Yang On Sun, Sep 25, 2011 at 10:39 AM, Brandon Williams

Re: frequent node UP/Down?

2011-09-25 Thread Brandon Williams
On Sun, Sep 25, 2011 at 12:52 PM, Yang tedd...@gmail.com wrote: Thanks Brandon. I suspected that, but I think that's precluded as a possibility since I setup another background job to do echo | nc other_box 7000 in a loop, this job seems to be working fine all the time, so network seems

Re: frequent node UP/Down?

2011-09-25 Thread Yang
Thanks Brandon. I'll try this. but you can also see my later post regarding message drop : http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E that seems to show something in either code or background load

Re: frequent node UP/Down?

2011-09-25 Thread Brandon Williams
On Sun, Sep 25, 2011 at 1:10 PM, Yang tedd...@gmail.com wrote: Thanks Brandon. I'll try this. but you can also see my later post regarding message drop :

frequent node UP/Down?

2011-09-24 Thread Yang
I'm using 1.0.0 there seems to be too many node Up/Dead events detected by the failure detector. I'm using a 2 node cluster on EC2, in the same region, same security group, so I assume the message drop rate should be fairly low. but in about every 5 minutes, I'm seeing some node detected as