Re: frequent node UP/Down?
it looks the new conns are created by the sending side (OutboundTCPconnection.java), when it detects a IOException on write(), since these timeouts happen rather frequently, about 10 -- 20 times per hour, I wonder really it's due to network in EC2, and really would like some ways to ascertain that ( like some logging in dmesg saying "connection dropped " etc ) -- ahhh, maybe I need an extensive tcpdump analysis session , which is a big pain. On Tue, Sep 27, 2011 at 7:22 PM, Yang wrote: > found the reason. > > the IncomingTCPConnection.run() hit an exception and the thread > terminated. the next incarnation of the thread did not come up until > 20 seconds later, which caused the TimedOutException and > UNavalableException to clients. > > > > WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java > (line 122) eof reading from socket; closing > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112) > > > > I don't know whether the EOF here is really due to network or something in > code > (if it's really network, is there a way to let IncomingTCPConnection > fire up the next one faster? like within 1 second I'm reading > through the code to find it ) > > Thanks > Yang > > > > On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams wrote: >> On Sun, Sep 25, 2011 at 1:10 PM, Yang wrote: >>> Thanks Brandon. >>> >>> I'll try this. >>> >>> but you can also see my later post regarding message drop : >>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E >>> >>> that seems to show something in either code or background load causing >>> messages to be really dropped >> >> I see. My guess is then this: there is a local clock problem, causing >> generations to be the same, thus not notifying the FD. So perhaps the >> problem is not network-related, but it is something in the ec2 >> environment. >> >> -Brandon >> >
Re: frequent node UP/Down?
ITC threads are started as soon as the other party connects. On Tue, Sep 27, 2011 at 9:22 PM, Yang wrote: > found the reason. > > the IncomingTCPConnection.run() hit an exception and the thread > terminated. the next incarnation of the thread did not come up until > 20 seconds later, which caused the TimedOutException and > UNavalableException to clients. > > > > WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java > (line 122) eof reading from socket; closing > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112) > > > > I don't know whether the EOF here is really due to network or something in > code > (if it's really network, is there a way to let IncomingTCPConnection > fire up the next one faster? like within 1 second I'm reading > through the code to find it ) > > Thanks > Yang > > > > On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams wrote: >> On Sun, Sep 25, 2011 at 1:10 PM, Yang wrote: >>> Thanks Brandon. >>> >>> I'll try this. >>> >>> but you can also see my later post regarding message drop : >>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E >>> >>> that seems to show something in either code or background load causing >>> messages to be really dropped >> >> I see. My guess is then this: there is a local clock problem, causing >> generations to be the same, thus not notifying the FD. So perhaps the >> problem is not network-related, but it is something in the ec2 >> environment. >> >> -Brandon >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: frequent node UP/Down?
found the reason. the IncomingTCPConnection.run() hit an exception and the thread terminated. the next incarnation of the thread did not come up until 20 seconds later, which caused the TimedOutException and UNavalableException to clients. WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java (line 122) eof reading from socket; closing java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112) I don't know whether the EOF here is really due to network or something in code (if it's really network, is there a way to let IncomingTCPConnection fire up the next one faster? like within 1 second I'm reading through the code to find it ) Thanks Yang On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams wrote: > On Sun, Sep 25, 2011 at 1:10 PM, Yang wrote: >> Thanks Brandon. >> >> I'll try this. >> >> but you can also see my later post regarding message drop : >> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E >> >> that seems to show something in either code or background load causing >> messages to be really dropped > > I see. My guess is then this: there is a local clock problem, causing > generations to be the same, thus not notifying the FD. So perhaps the > problem is not network-related, but it is something in the ec2 > environment. > > -Brandon >
Re: frequent node UP/Down?
On Sun, Sep 25, 2011 at 1:10 PM, Yang wrote: > Thanks Brandon. > > I'll try this. > > but you can also see my later post regarding message drop : > http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E > > that seems to show something in either code or background load causing > messages to be really dropped I see. My guess is then this: there is a local clock problem, causing generations to be the same, thus not notifying the FD. So perhaps the problem is not network-related, but it is something in the ec2 environment. -Brandon
Re: frequent node UP/Down?
Thanks Brandon. I'll try this. but you can also see my later post regarding message drop : http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E that seems to show something in either code or background load causing messages to be really dropped Yang On Sun, Sep 25, 2011 at 10:59 AM, Brandon Williams wrote: > On Sun, Sep 25, 2011 at 12:52 PM, Yang wrote: >> Thanks Brandon. >> >> I suspected that, but I think that's precluded as a possibility since >> I setup another background job to do >> echo | nc other_box 7000 >> in a loop, >> this job seems to be working fine all the time, so network seems fine. > > This isn't measuring latency, however. That is how the failure > detector works, using probability to estimate the likelihood that a > given host is alive, based on previous history. The situation on ec2 > is something like the following: 99% of pings are 1ms, but sometimes > there are brief periods of 100ms, and this is where the FD says "this > is not realistic, I think the host is dead" but then receives the > ping, and thus the flapping. I've seen it a million times, increasing > the phi threshold always solves it. > > -Brandon >
Re: frequent node UP/Down?
On Sun, Sep 25, 2011 at 12:52 PM, Yang wrote: > Thanks Brandon. > > I suspected that, but I think that's precluded as a possibility since > I setup another background job to do > echo | nc other_box 7000 > in a loop, > this job seems to be working fine all the time, so network seems fine. This isn't measuring latency, however. That is how the failure detector works, using probability to estimate the likelihood that a given host is alive, based on previous history. The situation on ec2 is something like the following: 99% of pings are 1ms, but sometimes there are brief periods of 100ms, and this is where the FD says "this is not realistic, I think the host is dead" but then receives the ping, and thus the flapping. I've seen it a million times, increasing the phi threshold always solves it. -Brandon
Re: frequent node UP/Down?
Thanks Brandon. I suspected that, but I think that's precluded as a possibility since I setup another background job to do echo | nc other_box 7000 in a loop, this job seems to be working fine all the time, so network seems fine. Yang On Sun, Sep 25, 2011 at 10:39 AM, Brandon Williams wrote: > On Sat, Sep 24, 2011 at 4:54 PM, Yang wrote: >> I'm using 1.0.0 >> >> >> there seems to be too many node Up/Dead events detected by the failure >> detector. >> I'm using a 2 node cluster on EC2, in the same region, same security >> group, so I assume the message drop >> rate should be fairly low. >> but in about every 5 minutes, I'm seeing some node detected as down, >> and then Up again quickly > > This is fairly common on ec2 due to wild variance in the network. > Increase your phi_convict_threshold to 10 or higher (but I wouldn't go > over 12, this is roughly an exponential increase) > > -Brandon >
Re: frequent node UP/Down?
On Sat, Sep 24, 2011 at 4:54 PM, Yang wrote: > I'm using 1.0.0 > > > there seems to be too many node Up/Dead events detected by the failure > detector. > I'm using a 2 node cluster on EC2, in the same region, same security > group, so I assume the message drop > rate should be fairly low. > but in about every 5 minutes, I'm seeing some node detected as down, > and then Up again quickly This is fairly common on ec2 due to wild variance in the network. Increase your phi_convict_threshold to 10 or higher (but I wouldn't go over 12, this is roughly an exponential increase) -Brandon
Re: frequent node UP/Down?
Dne 25.9.2011 14:31, Radim Kolar napsal(a): Dne 25.9.2011 9:29, Philippe napsal(a): I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. i have this problem too. Node down detection must be improved - increased timeouts a bit or make more tries before making decision. If node is under load (especially if there is swap activity), it is often marked unavailable. Also there needs to be implemented algorithm like it is used in BGP routing protocol to prevent route flap. It should guard against cases like this: INFO [GossipTasks:1] 2011-09-25 14:56:36,544 Gossiper.java (line 695) InetAddress /216.17.99.40 is now dead. INFO [GossipStage:1] 2011-09-25 14:56:36,641 Gossiper.java (line 681) InetAddress /216.17.99.40 is now UP INFO [GossipTasks:1] 2011-09-25 14:56:37,823 Gossiper.java (line 695) InetAddress /216.17.99.40 is now dead. INFO [GossipStage:1] 2011-09-25 14:56:37,971 Gossiper.java (line 681) InetAddress /216.17.99.40 is now UP route flap protection works like - announce 1st state change immediately to peer, next change for example after 30 seconds if state is changed in less than 30 seconds, if route keeps flaping up/down then increase report time to 60 seconds etc.
Re: frequent node UP/Down?
Dne 25.9.2011 9:29, Philippe napsal(a): I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. i have this problem too. Node down detection must be improved - increased timeouts a bit or make more tries before making decision. If node is under load (especially if there is swap activity), it is often marked unavailable.
Re: frequent node UP/Down?
I have this happening on 0.8.x It looks to me as this happens when the node is under heavy load such as unthrottled compactions or a huge GC. 2011/9/24 Yang > I'm using 1.0.0 > > > there seems to be too many node Up/Dead events detected by the failure > detector. > I'm using a 2 node cluster on EC2, in the same region, same security > group, so I assume the message drop > rate should be fairly low. > but in about every 5 minutes, I'm seeing some node detected as down, > and then Up again quickly, like the following > > > INFO 20:30:12,726 InetAddress /10.71.111.222 is now dead. > INFO 20:30:32,154 InetAddress /10.71.111.222 is now UP > > > does the "1 in every 5 minutes" sound roughly right for your setup? I > just want to make sure the unresponsiveness is not > caused by something like memtable flushing, or GC, which I can > probably further tune. > > > Thanks > Yang >