Confirm this issue by following tests
suppose a cluster contained 8 nodes, which contained about 10000 rows(key range 
from 1 to 10000):
Address       Status     Load          Range                                    
  Ring
                                       170141183460469231731687303715884105728  
  
10.237.4.85   Up         757.13 MB     21267647932558653966460912964485513216   
  |<--|
10.237.1.135  Up         761.54 MB     42535295865117307932921825928971026432   
  |   ^
10.237.1.137  Up         748.02 MB     63802943797675961899382738893456539648   
  v   |
10.237.1.139  Up         732.36 MB     85070591730234615865843651857942052864   
  |   ^
10.237.1.140  Up         725.6 MB      106338239662793269832304564822427566080  
  v   |
10.237.1.141  Up         726.59 MB     127605887595351923798765477786913079296  
  |   ^
10.237.1.143  Up         728.16 MB     148873535527910577765226390751398592512  
  v   |
10.237.1.144  Up         745.69 MB     170141183460469231731687303715884105728  
  |-->|

(1)     Read keys range [1-10000], all keys read out ok ( client send read 
request directly to 10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 )
(2)     Turn-off 10.237.1.135 while remain pressure, some read request will 
time out,
after all nodes know 10.237.1.135 has down (about 10 s later), all read request 
become ok again, that’s fine
(3)     After turn-on 10.237.1.135(and cassandra service also), some read 
request will time out again, and will remain FOREVER even all nodes know 
10.237.1.135 has up, 
That’s a PROBLEM!
(4)     Reboot 10.237.1.135, problem remains.
(5)     If stop pressure and reboot whole cluster then perform step 1, all 
things are fine, again…..

All read request use Quorum policy, version of Cassandra is 
apache-cassandra-incubating-0.5.0-beta2, and I’ve tested 
apache-cassandra-incubating-0.5.0-RC1, problem remains.

After read system.log, I found after 10.237.1.135 down and up again, other 
nodes will not establish tcp connection to it(on tcp port 7000 ) forever! 
And read request sent to 10.237.1.135(into Pending-Writes because socket 
channel is closed) will not sent to net forever(from observing tcpdump).

It’s seems when 10.237.1.135 going down in step2, some socket channel was reset 
,
after 10.237.1.135 come back, these socket channel remain closed, forever
---------END----------


-----Original Message-----
From: Jonathan Ellis (JIRA) [mailto:j...@apache.org] 
Sent: Thursday, December 24, 2009 10:47 AM
To: cassandra-comm...@incubator.apache.org
Subject: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and 
sometimes kills traffic to a node if you restart it.


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-651:
-------------------------------------

    Fix Version/s: 0.5
         Assignee: Jaakko Laine

> cassandra 0.5 version throttles and sometimes kills traffic to a node if you 
> restart it.
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-651
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-651
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: latest in 0.5 branch
>            Reporter: Ramzi Rabah
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> From the cassandra user message board: 
> "I just recently upgraded to latest in 0.5 branch, and I am running
> into a serious issue. I have a cluster with 4 nodes, rackunaware
> strategy, and using my own tokens distributed evenly over the hash
> space. I am writing/reading equally to them at an equal rate of about
> 230 reads/writes per second(and cfstats shows that). The first 3 nodes
> are seeds, the last one isn't. When I start all the nodes together at
> the same time, they all receive equal amounts of reads/writes (about
> 230).
> When I bring node 4 down and bring it back up again, node 4's load
> fluctuates between the 230 it used to get to sometimes no traffic at
> all. The other 3 still have the same amount of traffic. And no errors
> what so ever seen in logs. " 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to