Re: frequent node UP/Down?

2011-09-27 Thread Yang
it looks the new conns are created by the sending side
(OutboundTCPconnection.java), when it detects a IOException on
write(),

since these timeouts happen rather frequently, about 10 -- 20 times
per hour, I wonder really it's due to network in EC2, and really would
like some ways to ascertain that ( like some logging in dmesg saying
"connection dropped " etc ) -- ahhh, maybe I need an extensive
tcpdump analysis session , which is a big pain.




On Tue, Sep 27, 2011 at 7:22 PM, Yang  wrote:
> found the reason.
>
> the IncomingTCPConnection.run() hit an exception and the thread
> terminated. the next incarnation of the thread did not come up until
> 20 seconds later, which caused the TimedOutException and
> UNavalableException to clients.
>
>
>
>  WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java
> (line 122) eof reading from socket; closing
> java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>        at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112)
>
>
>
> I don't know whether the EOF here is really due to network or something in 
> code
> (if it's really network, is there a way to let IncomingTCPConnection
> fire up the next one faster? like within 1 second I'm reading
> through the code to find it )
>
> Thanks
> Yang
>
>
>
> On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams  wrote:
>> On Sun, Sep 25, 2011 at 1:10 PM, Yang  wrote:
>>> Thanks Brandon.
>>>
>>> I'll try this.
>>>
>>> but you can also see my later post regarding message drop :
>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E
>>>
>>> that seems to show something in either code or background load causing
>>> messages to be really dropped
>>
>> I see.  My guess is then this: there is a local clock problem, causing
>> generations to be the same, thus not notifying the FD.  So perhaps the
>> problem is not network-related, but it is something in the ec2
>> environment.
>>
>> -Brandon
>>
>


Re: frequent node UP/Down?

2011-09-27 Thread Jonathan Ellis
ITC threads are started as soon as the other party connects.

On Tue, Sep 27, 2011 at 9:22 PM, Yang  wrote:
> found the reason.
>
> the IncomingTCPConnection.run() hit an exception and the thread
> terminated. the next incarnation of the thread did not come up until
> 20 seconds later, which caused the TimedOutException and
> UNavalableException to clients.
>
>
>
>  WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java
> (line 122) eof reading from socket; closing
> java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>        at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112)
>
>
>
> I don't know whether the EOF here is really due to network or something in 
> code
> (if it's really network, is there a way to let IncomingTCPConnection
> fire up the next one faster? like within 1 second I'm reading
> through the code to find it )
>
> Thanks
> Yang
>
>
>
> On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams  wrote:
>> On Sun, Sep 25, 2011 at 1:10 PM, Yang  wrote:
>>> Thanks Brandon.
>>>
>>> I'll try this.
>>>
>>> but you can also see my later post regarding message drop :
>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E
>>>
>>> that seems to show something in either code or background load causing
>>> messages to be really dropped
>>
>> I see.  My guess is then this: there is a local clock problem, causing
>> generations to be the same, thus not notifying the FD.  So perhaps the
>> problem is not network-related, but it is something in the ec2
>> environment.
>>
>> -Brandon
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: frequent node UP/Down?

2011-09-27 Thread Yang
found the reason.

the IncomingTCPConnection.run() hit an exception and the thread
terminated. the next incarnation of the thread did not come up until
20 seconds later, which caused the TimedOutException and
UNavalableException to clients.



 WARN [Thread-28] 2011-09-28 02:17:57,561 IncomingTcpConnection.java
(line 122) eof reading from socket; closing
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:112)



I don't know whether the EOF here is really due to network or something in code
(if it's really network, is there a way to let IncomingTCPConnection
fire up the next one faster? like within 1 second I'm reading
through the code to find it )

Thanks
Yang



On Sun, Sep 25, 2011 at 1:04 PM, Brandon Williams  wrote:
> On Sun, Sep 25, 2011 at 1:10 PM, Yang  wrote:
>> Thanks Brandon.
>>
>> I'll try this.
>>
>> but you can also see my later post regarding message drop :
>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E
>>
>> that seems to show something in either code or background load causing
>> messages to be really dropped
>
> I see.  My guess is then this: there is a local clock problem, causing
> generations to be the same, thus not notifying the FD.  So perhaps the
> problem is not network-related, but it is something in the ec2
> environment.
>
> -Brandon
>


Re: frequent node UP/Down?

2011-09-25 Thread Brandon Williams
On Sun, Sep 25, 2011 at 1:10 PM, Yang  wrote:
> Thanks Brandon.
>
> I'll try this.
>
> but you can also see my later post regarding message drop :
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E
>
> that seems to show something in either code or background load causing
> messages to be really dropped

I see.  My guess is then this: there is a local clock problem, causing
generations to be the same, thus not notifying the FD.  So perhaps the
problem is not network-related, but it is something in the ec2
environment.

-Brandon


Re: frequent node UP/Down?

2011-09-25 Thread Yang
Thanks Brandon.

I'll try this.

but you can also see my later post regarding message drop :
http://mail-archives.apache.org/mod_mbox/cassandra-user/201109.mbox/%3ccaanh3_8aehidyh9ybt82_emh3likbcdsenrak3jhfzaj2l+...@mail.gmail.com%3E

that seems to show something in either code or background load causing
messages to be really dropped


Yang

On Sun, Sep 25, 2011 at 10:59 AM, Brandon Williams  wrote:
> On Sun, Sep 25, 2011 at 12:52 PM, Yang  wrote:
>> Thanks Brandon.
>>
>> I suspected that, but I think that's precluded as a possibility since
>> I setup another background job to do
>> echo | nc other_box 7000
>> in a loop,
>> this job seems to be working fine all the time, so network seems fine.
>
> This isn't measuring latency, however.  That is how the failure
> detector works, using probability to estimate the likelihood that a
> given host is alive, based on previous history.  The situation on ec2
> is something like the following: 99% of pings are 1ms, but sometimes
> there are brief periods of 100ms, and this is where the FD says "this
> is not realistic, I think the host is dead" but then receives the
> ping, and thus the flapping.  I've seen it a million times, increasing
> the phi threshold always solves it.
>
> -Brandon
>


Re: frequent node UP/Down?

2011-09-25 Thread Brandon Williams
On Sun, Sep 25, 2011 at 12:52 PM, Yang  wrote:
> Thanks Brandon.
>
> I suspected that, but I think that's precluded as a possibility since
> I setup another background job to do
> echo | nc other_box 7000
> in a loop,
> this job seems to be working fine all the time, so network seems fine.

This isn't measuring latency, however.  That is how the failure
detector works, using probability to estimate the likelihood that a
given host is alive, based on previous history.  The situation on ec2
is something like the following: 99% of pings are 1ms, but sometimes
there are brief periods of 100ms, and this is where the FD says "this
is not realistic, I think the host is dead" but then receives the
ping, and thus the flapping.  I've seen it a million times, increasing
the phi threshold always solves it.

-Brandon


Re: frequent node UP/Down?

2011-09-25 Thread Yang
Thanks Brandon.

I suspected that, but I think that's precluded as a possibility since
I setup another background job to do
echo | nc other_box 7000
in a loop,
this job seems to be working fine all the time, so network seems fine.

Yang

On Sun, Sep 25, 2011 at 10:39 AM, Brandon Williams  wrote:
> On Sat, Sep 24, 2011 at 4:54 PM, Yang  wrote:
>> I'm using 1.0.0
>>
>>
>> there seems to be too many node Up/Dead events detected by the failure
>> detector.
>> I'm using  a 2 node cluster on EC2, in the same region, same security
>> group, so I assume the message drop
>> rate should be fairly low.
>> but in about every 5 minutes, I'm seeing some node detected as down,
>> and then Up again quickly
>
> This is fairly common on ec2 due to wild variance in the network.
> Increase your phi_convict_threshold to 10 or higher (but I wouldn't go
> over 12, this is roughly an exponential increase)
>
> -Brandon
>


Re: frequent node UP/Down?

2011-09-25 Thread Brandon Williams
On Sat, Sep 24, 2011 at 4:54 PM, Yang  wrote:
> I'm using 1.0.0
>
>
> there seems to be too many node Up/Dead events detected by the failure
> detector.
> I'm using  a 2 node cluster on EC2, in the same region, same security
> group, so I assume the message drop
> rate should be fairly low.
> but in about every 5 minutes, I'm seeing some node detected as down,
> and then Up again quickly

This is fairly common on ec2 due to wild variance in the network.
Increase your phi_convict_threshold to 10 or higher (but I wouldn't go
over 12, this is roughly an exponential increase)

-Brandon


Re: frequent node UP/Down?

2011-09-25 Thread Radim Kolar

Dne 25.9.2011 14:31, Radim Kolar napsal(a):

Dne 25.9.2011 9:29, Philippe napsal(a):
I have this happening on 0.8.x It looks to me as this happens when 
the node is under heavy load such as unthrottled compactions or a 
huge GC.
i have this problem too. Node down detection must be improved - 
increased timeouts a bit or make more tries before making decision. If 
node is under load (especially if there is swap activity), it is often 
marked unavailable.
Also there needs to be implemented algorithm like it is used in BGP 
routing protocol to prevent route flap. It should guard against cases 
like this:


  INFO [GossipTasks:1] 2011-09-25 14:56:36,544 Gossiper.java (line 695) 
InetAddress /216.17.99.40 is now dead.
 INFO [GossipStage:1] 2011-09-25 14:56:36,641 Gossiper.java (line 681) 
InetAddress /216.17.99.40 is now UP
 INFO [GossipTasks:1] 2011-09-25 14:56:37,823 Gossiper.java (line 695) 
InetAddress /216.17.99.40 is now dead.
 INFO [GossipStage:1] 2011-09-25 14:56:37,971 Gossiper.java (line 681) 
InetAddress /216.17.99.40 is now UP


route flap protection works like - announce 1st state change immediately 
to peer, next change for example after 30 seconds if state is changed in 
less than 30 seconds, if route keeps flaping up/down then increase 
report time to 60 seconds etc.


Re: frequent node UP/Down?

2011-09-25 Thread Radim Kolar

Dne 25.9.2011 9:29, Philippe napsal(a):
I have this happening on 0.8.x It looks to me as this happens when the 
node is under heavy load such as unthrottled compactions or a huge GC.
i have this problem too. Node down detection must be improved - 
increased timeouts a bit or make more tries before making decision. If 
node is under load (especially if there is swap activity), it is often 
marked unavailable.


Re: frequent node UP/Down?

2011-09-25 Thread Philippe
I have this happening on 0.8.x It looks to me as this happens when the node
is under heavy load such as unthrottled compactions or a huge GC.

2011/9/24 Yang 

> I'm using 1.0.0
>
>
> there seems to be too many node Up/Dead events detected by the failure
> detector.
> I'm using  a 2 node cluster on EC2, in the same region, same security
> group, so I assume the message drop
> rate should be fairly low.
> but in about every 5 minutes, I'm seeing some node detected as down,
> and then Up again quickly, like the following
>
>
>  INFO 20:30:12,726 InetAddress /10.71.111.222 is now dead.
>  INFO 20:30:32,154 InetAddress /10.71.111.222 is now UP
>
>
> does the "1 in every 5 minutes" sound roughly right for your setup? I
> just want to make sure the unresponsiveness is not
> caused by something like memtable flushing, or GC, which I can
> probably further tune.
>
>
> Thanks
> Yang
>