Confirm this issue by following tests
suppose a cluster contained 8 nodes, which contained about 10000 rows(key range
from 1 to 10000):
Address Status Load Range
Ring
170141183460469231731687303715884105728
10.237.4.85 Up 757.13 MB 21267647932558653966460912964485513216
|<--|
10.237.1.135 Up 761.54 MB 42535295865117307932921825928971026432
| ^
10.237.1.137 Up 748.02 MB 63802943797675961899382738893456539648
v |
10.237.1.139 Up 732.36 MB 85070591730234615865843651857942052864
| ^
10.237.1.140 Up 725.6 MB 106338239662793269832304564822427566080
v |
10.237.1.141 Up 726.59 MB 127605887595351923798765477786913079296
| ^
10.237.1.143 Up 728.16 MB 148873535527910577765226390751398592512
v |
10.237.1.144 Up 745.69 MB 170141183460469231731687303715884105728
|-->|
(1) Read keys range [1-10000], all keys read out ok ( client send read
request directly to 10.237.4.85, 10.237.1.137, 10.237.1.140, 10.237.1.143 )
(2) Turn-off 10.237.1.135 while remain pressure, some read request will
time out,
after all nodes know 10.237.1.135 has down (about 10 s later), all read request
become ok again, that’s fine
(3) After turn-on 10.237.1.135(and cassandra service also), some read
request will time out again, and will remain FOREVER even all nodes know
10.237.1.135 has up,
That’s a PROBLEM!
(4) Reboot 10.237.1.135, problem remains.
(5) If stop pressure and reboot whole cluster then perform step 1, all
things are fine, again…..
All read request use Quorum policy, version of Cassandra is
apache-cassandra-incubating-0.5.0-beta2, and I’ve tested
apache-cassandra-incubating-0.5.0-RC1, problem remains.
After read system.log, I found after 10.237.1.135 down and up again, other
nodes will not establish tcp connection to it(on tcp port 7000 ) forever!
And read request sent to 10.237.1.135(into Pending-Writes because socket
channel is closed) will not sent to net forever(from observing tcpdump).
It’s seems when 10.237.1.135 going down in step2, some socket channel was reset
,
after 10.237.1.135 come back, these socket channel remain closed, forever
---------END----------
-----Original Message-----
From: Jonathan Ellis (JIRA) [mailto:[email protected]]
Sent: Thursday, December 24, 2009 10:47 AM
To: [email protected]
Subject: [jira] Updated: (CASSANDRA-651) cassandra 0.5 version throttles and
sometimes kills traffic to a node if you restart it.
[
https://issues.apache.org/jira/browse/CASSANDRA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-651:
-------------------------------------
Fix Version/s: 0.5
Assignee: Jaakko Laine
> cassandra 0.5 version throttles and sometimes kills traffic to a node if you
> restart it.
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-651
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.5
> Environment: latest in 0.5 branch
> Reporter: Ramzi Rabah
> Assignee: Jaakko Laine
> Fix For: 0.5
>
>
> From the cassandra user message board:
> "I just recently upgraded to latest in 0.5 branch, and I am running
> into a serious issue. I have a cluster with 4 nodes, rackunaware
> strategy, and using my own tokens distributed evenly over the hash
> space. I am writing/reading equally to them at an equal rate of about
> 230 reads/writes per second(and cfstats shows that). The first 3 nodes
> are seeds, the last one isn't. When I start all the nodes together at
> the same time, they all receive equal amounts of reads/writes (about
> 230).
> When I bring node 4 down and bring it back up again, node 4's load
> fluctuates between the 230 it used to get to sometimes no traffic at
> all. The other 3 still have the same amount of traffic. And no errors
> what so ever seen in logs. "
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.