Confirm this issue by following tests suppose a cluster contained 8 nodes, which contained about 10000 rows(key range from 1 to 10000): Address Status Load Range Ring 170141183460469231731687303715884105728 10.237.4.85 Up 757.13 MB 21267647932558653966460912964485513216 |<--| 10.237.1.135 Up 761.54 MB 42535295865117307932921825928971026432 | ^ 10.237.1.137 Up 748.02 MB 63802943797675961899382738893456539648 v | 10.237.1.139 Up 732.36 MB 85070591730234615865843651857942052864 | ^ 10.237.1.140 Up 725.6 MB 106338239662793269832304564822427566080 v | 10.237.1.141 Up 726.59 MB 127605887595351923798765477786913079296 | ^ 10.237.1.143 Up 728.16 MB 148873535527910577765226390751398592512 v | 10.237.1.144 Up 745.69 MB 170141183460469231731687303715884105728 |-->|
(1) Read keys range [1-10000], all keys read out ok ( client send read request directly to 10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 ) (2) Turn-off 10.237.1.135 while remain pressure, some read request will time out, after all nodes know 10.237.1.135 has down (about 10 s later), all read request become ok again, that’s fine (3) After turn-on 10.237.1.135(and cassandra service also), some read request will time out again, and will remain FOREVER even all nodes know 10.237.1.135 has up, That’s a PROBLEM! (4) Reboot 10.237.1.135, problem remains. (5) If stop pressure and reboot whole cluster then perform step 1, all things are fine, again….. All read request use Quorum policy, version of Cassandra is apache-cassandra-incubating-0.5.0-beta2, and I’ve tested apache-cassandra-incubating-0.5.0-RC1, problem remains. After read system.log, I found after 10.237.1.135 down and up again, other nodes will not establish tcp connection to it(on tcp port 7000 ) forever! And read request sent to 10.237.1.135(into Pending-Writes because socket channel is closed) will not sent to net forever(from observing tcpdump). It’s seems when 10.237.1.135 going down in step2, some socket channel was reset , after 10.237.1.135 come back, these socket channel remain closed, forever ---------END---------- -----Original Message----- From: Jonathan Ellis (JIRA) [mailto:j...@apache.org] Sent: Thursday, December 24, 2009 10:47 AM To: cassandra-comm...@incubator.apache.org Subject: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and sometimes kills traffic to a node if you restart it. [ https://issues.apache.org/jira/browse/CASSANDRA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-651: ------------------------------------- Fix Version/s: 0.5 Assignee: Jaakko Laine > cassandra 0.5 version throttles and sometimes kills traffic to a node if you > restart it. > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-651 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.5 > Environment: latest in 0.5 branch > Reporter: Ramzi Rabah > Assignee: Jaakko Laine > Fix For: 0.5 > > > From the cassandra user message board: > "I just recently upgraded to latest in 0.5 branch, and I am running > into a serious issue. I have a cluster with 4 nodes, rackunaware > strategy, and using my own tokens distributed evenly over the hash > space. I am writing/reading equally to them at an equal rate of about > 230 reads/writes per second(and cfstats shows that). The first 3 nodes > are seeds, the last one isn't. When I start all the nodes together at > the same time, they all receive equal amounts of reads/writes (about > 230). > When I bring node 4 down and bring it back up again, node 4's load > fluctuates between the 230 it used to get to sometimes no traffic at > all. The other 3 still have the same amount of traffic. And no errors > what so ever seen in logs. " -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.