Re: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and sometimes kills traffic to a node if you restart it.

Jonathan Ellis Mon, 28 Dec 2009 05:50:47 -0800

If you want this to be part of the jira record, you need to add it as
a comment on the issue; jira is not configured to turn emails into
comments automatically.


On Sun, Dec 27, 2009 at 11:07 PM, Michael Lee
<mail.list.steel.men...@gmail.com> wrote:
> Confirm this issue by following tests
> suppose a cluster contained 8 nodes, which contained about 10000 rows(key 
> range from 1 to 10000):
> Address       Status     Load          Range                                  
>     Ring
>                                       170141183460469231731687303715884105728
> 10.237.4.85   Up         757.13 MB     21267647932558653966460912964485513216 
>     |<--|
> 10.237.1.135  Up         761.54 MB     42535295865117307932921825928971026432 
>     |   ^
> 10.237.1.137  Up         748.02 MB     63802943797675961899382738893456539648 
>     v   |
> 10.237.1.139  Up         732.36 MB     85070591730234615865843651857942052864 
>     |   ^
> 10.237.1.140  Up         725.6 MB      
> 106338239662793269832304564822427566080    v   |
> 10.237.1.141  Up         726.59 MB     
> 127605887595351923798765477786913079296    |   ^
> 10.237.1.143  Up         728.16 MB     
> 148873535527910577765226390751398592512    v   |
> 10.237.1.144  Up         745.69 MB     
> 170141183460469231731687303715884105728    |-->|
>
> (1)     Read keys range [1-10000], all keys read out ok ( client send read 
> request directly to 10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 )
> (2)     Turn-off 10.237.1.135 while remain pressure, some read request will 
> time out,
> after all nodes know 10.237.1.135 has down (about 10 s later), all read 
> request become ok again, that’s fine
> (3)     After turn-on 10.237.1.135(and cassandra service also), some read 
> request will time out again, and will remain FOREVER even all nodes know 
> 10.237.1.135 has up,
> That’s a PROBLEM!
> (4)     Reboot 10.237.1.135, problem remains.
> (5)     If stop pressure and reboot whole cluster then perform step 1, all 
> things are fine, again…..
>
> All read request use Quorum policy, version of Cassandra is 
> apache-cassandra-incubating-0.5.0-beta2, and I’ve tested 
> apache-cassandra-incubating-0.5.0-RC1, problem remains.
>
> After read system.log, I found after 10.237.1.135 down and up again, other 
> nodes will not establish tcp connection to it(on tcp port 7000 ) forever!
> And read request sent to 10.237.1.135(into Pending-Writes because socket 
> channel is closed) will not sent to net forever(from observing tcpdump).
>
> It’s seems when 10.237.1.135 going down in step2, some socket channel was 
> reset ,
> after 10.237.1.135 come back, these socket channel remain closed, forever
> ---------END----------
>
>
> -----Original Message-----
> From: Jonathan Ellis (JIRA) [mailto:j...@apache.org]
> Sent: Thursday, December 24, 2009 10:47 AM
> To: cassandra-comm...@incubator.apache.org
> Subject: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and 
> sometimes kills traffic to a node if you restart it.
>
>
>     [ 
> https://issues.apache.org/jira/browse/CASSANDRA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
>
> Jonathan Ellis updated CASSANDRA-651:
> -------------------------------------
>
>    Fix Version/s: 0.5
>         Assignee: Jaakko Laine
>
>> cassandra 0.5 version throttles and sometimes kills traffic to a node if you 
>> restart it.
>> ----------------------------------------------------------------------------------------
>>
>>                 Key: CASSANDRA-651
>>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-651
>>             Project: Cassandra
>>          Issue Type: Bug
>>          Components: Core
>>    Affects Versions: 0.5
>>         Environment: latest in 0.5 branch
>>            Reporter: Ramzi Rabah
>>            Assignee: Jaakko Laine
>>             Fix For: 0.5
>>
>>
>> From the cassandra user message board:
>> "I just recently upgraded to latest in 0.5 branch, and I am running
>> into a serious issue. I have a cluster with 4 nodes, rackunaware
>> strategy, and using my own tokens distributed evenly over the hash
>> space. I am writing/reading equally to them at an equal rate of about
>> 230 reads/writes per second(and cfstats shows that). The first 3 nodes
>> are seeds, the last one isn't. When I start all the nodes together at
>> the same time, they all receive equal amounts of reads/writes (about
>> 230).
>> When I bring node 4 down and bring it back up again, node 4's load
>> fluctuates between the 230 it used to get to sometimes no traffic at
>> all. The other 3 still have the same amount of traffic. And no errors
>> what so ever seen in logs. "
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Re: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and sometimes kills traffic to a node if you restart it.

Reply via email to